This is a class of problem I have yet to find a good way to deal with handling and wanted some opinions. I have a table with data and above it 12 different drop downs and 1 text field all of which contribute to the WHERE clause of the query. For security and caching sake I want to try to convert this to a named query but the only two ways I can think of
Use query strings - I still leave myself open to SQL injection (albeit less but still there)
Make a named query per filter combination - with 13 different options for the user, this would mean hundreds of named queries.
To make it concrete here are the filters available -
When All is selected for the dropdown, I don’t need any where clause for that item, but if something for the dropdown Contact is selected for instance, then I would want WHERE contact=something
Any thoughts on a clean approach that doesn’t use query strings? I hesitate to make the filters work via scripting as I have to think my database can do a WHERE clause filter much faster than jython could iterate through tens thousands of records. All opinions welcome.
Just a note that we’ve had a few security scares recently and so I am trying to avoid anything that still requires Legacy Database Access. I really want to be able to check that box off.
Maybe I can use @dkhayes117 and @lrose and write these functions as scripts that run on the gateway via a message handler. That might be my best bet. But other opinions are still welcome. I’m sure everyone faces this problem and I’m curious how its “best” handled.
Filtering on an unfiltered query in a script doesn’t require legacy access. But it really depends on how much data you get in an unfiltered query and if it’s ever needed to have an unfiltered query.
Something like this may work. (Written here in the reply field)
def filter(namedQueryName, namedQueryParams, filters):
"""
namedQueryName: Name of the query.
namedQueryParams: Parameter dict for the named query
filters: Dict of filters to apply where the key is the column name and the value is the filter
"""
res = system.db.runNamedQuery(namedQueryName, namedQueryParams)
headers = [c for c in res.getColumnNames()]
rows = []
for r in range(res.rowCount):
include = True
for c, f in filters.items():
if res.getValueAt(r, c) != f:
include = False
break
if include:
rows.append([res.getValueAt(r, c) for c in headers])
return system.dataset.toDataSet(headers, rows)
You can use CASE statements in WHERE clauses. They’re not very pretty to look at, but they do the job for dynamic WHERE clauses whilst reducing the risk of SQL injection (dynamic queries )
For example:
SELECT *
FROM myTable
WHERE
myCol = CASE WHEN @myParam = 'All'
THEN myCol
ELSE @myParam
myCol2 = CASE WHEN @myParam2 = 'All'
THEN myCol2
ELSE @myParam2
It’s very unlikely there’s an index on every column that can be filtered on so the query would not be 100% optimized anyways. Indexing the columns that are filtered most often may help or the columns that can remove big chunks of data quickly. It all depends on so many things.
I will say the database is well optimized - it was one of the first things I worked on here. Yes not every filterable column is indexed but the worst offenders most definitely are now.
I’m using filters in my own queries and I tend to go with the “filter on raw dataset” way. Performance are rarely an issue and is easy to implement and should be easy enough to change when you hit the wall
Hope I don’t offend anyone if I don’t choose a solution as it does seem to be conditional on the project.
For people in the future reading this thread theses are your options
General named query and jython filtering
Pros: Named query is very simple and with proper db planning you can probably use the same jython filter function for all your tables/filters
Cons: The more records you have, the more of a performance hit you will get - but again depends on specs
Gateway Script Dynamic SQL
Pros: No no-ops in the db, no need to write SQL
Cons: Need to make a whitelist of tables to allow otherwise a spoofed client could probably use this to see more of your db than you’d like. For complicated queries that don’t fit simple SELECT * FROM table, you might need to create a map of caveats. But if you’re white listing tables that shouldn’t be too problematic.
Named Query with conditional Where clauses
Pros: All your logic in one place, all you have to do is call the named query with parameters without any extra steps, no extra security steps needed
Cons: Some incredibly ugly SQL
Thanks for the suggestions, I have to think about it. If anyone else has any ideas or want suggest any other pros/cons for their method of choice feel free. Otherrwise I think I have my answer(s).
FWIW, I would choose #2. Note that you should also whitelist the columns you allow. Partly because a gateway message handler could do custom caching based on the criteria supplied.
Same as if tables aren't whitelisted. SQL injection from a client. Table names and column names are part of a query's structure and therefore cannot be supplied by a ? substitution. Everything that isn't passed via ? substitution needs to be whitelisted. If you like, you can use a DB's information_schema to populate your whitelist.
Ok I understand why you need to whitelist columns to also prevent SQL injection.
Could you explain how to do custom caching? Caching was one of the main benefits I saw to trying to use a named query, but if it can be done manually that definitively makes the second method my favorite. My initial thought is to make custom dataset tags that hold the last query result and then updating them when appropriate when a row is updated/deleted/inserted. So if I had implemented method two and the first person does a select, I have that now in my tag, and for the next person does a select as well (without any Create/Update/Delete operations in between) then I would just return the tag value, instead of running the query again. Is that how you would do it or is there a better way?
Don’t use tags - they’re slow, and have poor guarantees. system.util.getGlobals() is probably your best bet. (I’d recommend guarding the actual cache behind a lock if you expect frequent update/select collision).