Named query with many conditional WHERE clauses without query string?

bkarabinchak.psi · June 17, 2021, 12:18pm

This is a class of problem I have yet to find a good way to deal with handling and wanted some opinions. I have a table with data and above it 12 different drop downs and 1 text field all of which contribute to the WHERE clause of the query. For security and caching sake I want to try to convert this to a named query but the only two ways I can think of

Use query strings - I still leave myself open to SQL injection (albeit less but still there)
Make a named query per filter combination - with 13 different options for the user, this would mean hundreds of named queries.

To make it concrete here are the filters available -

When All is selected for the dropdown, I don’t need any where clause for that item, but if something for the dropdown Contact is selected for instance, then I would want WHERE contact=something

Any thoughts on a clean approach that doesn’t use query strings? I hesitate to make the filters work via scripting as I have to think my database can do a WHERE clause filter much faster than jython could iterate through tens thousands of records. All opinions welcome.

dkhayes117 · June 17, 2021, 12:28pm

This Thread might help some
Best Method for Searching Database with Power Table - Ignition - Inductive Automation Forum

lrose · June 17, 2021, 12:33pm

Honestly, I think I would probably build a scripted solution as a project script, and just build the query and use runPrepQuery.

Something like:

def runFilteredQuery(filters = []):
    if filters:
        #build where clause
    #run query

Doesn’t have all the advantages of a Named Query, but keeps the script in a single location for easy editing and prevents SQL Injection.

Claus_Nielsen · June 17, 2021, 12:35pm

Depending on the amount of data you have, you could just filter on the raw data from an unfiltered query.

bkarabinchak.psi · June 17, 2021, 12:38pm

Just a note that we’ve had a few security scares recently and so I am trying to avoid anything that still requires Legacy Database Access. I really want to be able to check that box off.

Maybe I can use @dkhayes117 and @lrose and write these functions as scripts that run on the gateway via a message handler. That might be my best bet. But other opinions are still welcome. I’m sure everyone faces this problem and I’m curious how its “best” handled.

dkhayes117 · June 17, 2021, 12:39pm

That is how I run it, as a gateway message handler

Claus_Nielsen · June 17, 2021, 12:51pm

Filtering on an unfiltered query in a script doesn’t require legacy access. But it really depends on how much data you get in an unfiltered query and if it’s ever needed to have an unfiltered query.

Something like this may work. (Written here in the reply field)

def filter(namedQueryName, namedQueryParams, filters):
	"""
		namedQueryName: Name of the query.
		namedQueryParams: Parameter dict for the named query
		filters: Dict of filters to apply where the key is the column name and the value is the filter
	"""
	res = system.db.runNamedQuery(namedQueryName, namedQueryParams)
	
	headers = [c for c in res.getColumnNames()]
	rows = []
	
	for r in range(res.rowCount):
		include = True
		for c, f in filters.items():
			if res.getValueAt(r, c) != f:
				include = False
				break
				
		if include:
			rows.append([res.getValueAt(r, c) for c in headers])
				
			
	return system.dataset.toDataSet(headers, rows)

lrose · June 17, 2021, 12:54pm

From the OP:

I tend to agree, the DB will be more performant than a script, particularly if the tables are indexed properly.

matthew.ayre · June 17, 2021, 12:54pm

You can use CASE statements in WHERE clauses. They’re not very pretty to look at, but they do the job for dynamic WHERE clauses whilst reducing the risk of SQL injection (dynamic queries )

For example:

SELECT *
FROM myTable
WHERE 
    myCol = CASE WHEN @myParam = 'All'
        THEN myCol
        ELSE @myParam
    myCol2 = CASE WHEN @myParam2 = 'All'
        THEN myCol2
        ELSE @myParam2

lrose · June 17, 2021, 12:57pm

is right. A single where clause potentially resolving to 10+ no-ops.

Would definitely work though.

Claus_Nielsen · June 17, 2021, 1:05pm

It’s very unlikely there’s an index on every column that can be filtered on so the query would not be 100% optimized anyways. Indexing the columns that are filtered most often may help or the columns that can remove big chunks of data quickly. It all depends on so many things.

bkarabinchak.psi · June 17, 2021, 1:06pm

I will say the database is well optimized - it was one of the first things I worked on here. Yes not every filterable column is indexed but the worst offenders most definitely are now.

Claus_Nielsen · June 17, 2021, 1:11pm

I’m using filters in my own queries and I tend to go with the “filter on raw dataset” way. Performance are rarely an issue and is easy to implement and should be easy enough to change when you hit the wall

bkarabinchak.psi · June 17, 2021, 1:19pm

Hope I don’t offend anyone if I don’t choose a solution as it does seem to be conditional on the project.

For people in the future reading this thread theses are your options

General named query and jython filtering
Pros: Named query is very simple and with proper db planning you can probably use the same jython filter function for all your tables/filters
Cons: The more records you have, the more of a performance hit you will get - but again depends on specs
Gateway Script Dynamic SQL
Pros: No no-ops in the db, no need to write SQL
Cons: Need to make a whitelist of tables to allow otherwise a spoofed client could probably use this to see more of your db than you’d like. For complicated queries that don’t fit simple SELECT * FROM table, you might need to create a map of caveats. But if you’re white listing tables that shouldn’t be too problematic.
Named Query with conditional Where clauses
Pros: All your logic in one place, all you have to do is call the named query with parameters without any extra steps, no extra security steps needed
Cons: Some incredibly ugly SQL

Thanks for the suggestions, I have to think about it. If anyone else has any ideas or want suggest any other pros/cons for their method of choice feel free. Otherrwise I think I have my answer(s).

pturmel · June 17, 2021, 1:24pm

FWIW, I would choose #2. Note that you should also whitelist the columns you allow. Partly because a gateway message handler could do custom caching based on the criteria supplied.

bkarabinchak.psi · June 17, 2021, 1:26pm

Message handlers do caching? I did not know that. Could you explain the worst case scenario if columns are not whitelisted?

Personally I was leaning 2 as well. I already have a SQL query builder made and I’d much rather the DB do the filtering when possible.

pturmel · June 17, 2021, 1:37pm

No. I wrote "could do custom caching". Emphasis added.

pturmel · June 17, 2021, 1:42pm

Same as if tables aren't whitelisted. SQL injection from a client. Table names and column names are part of a query's structure and therefore cannot be supplied by a ? substitution. Everything that isn't passed via ? substitution needs to be whitelisted. If you like, you can use a DB's information_schema to populate your whitelist.

bkarabinchak.psi · June 17, 2021, 1:50pm

Ok I understand why you need to whitelist columns to also prevent SQL injection.

Could you explain how to do custom caching? Caching was one of the main benefits I saw to trying to use a named query, but if it can be done manually that definitively makes the second method my favorite. My initial thought is to make custom dataset tags that hold the last query result and then updating them when appropriate when a row is updated/deleted/inserted. So if I had implemented method two and the first person does a select, I have that now in my tag, and for the next person does a select as well (without any Create/Update/Delete operations in between) then I would just return the tag value, instead of running the query again. Is that how you would do it or is there a better way?

PGriffith · June 17, 2021, 2:34pm

Don’t use tags - they’re slow, and have poor guarantees. system.util.getGlobals() is probably your best bet. (I’d recommend guarding the actual cache behind a lock if you expect frequent update/select collision).