Single Row Insert Performance

BinaryBush · March 8, 2021, 10:18pm

Hello,

I’m looking for advice on how to improve the performance of my queries. I have a single table with 12 columns, and I am inserting roughly 1200 rows into this table. Currently I am using a named query that checks if the row exists, if it does, it updates the current values, and if not, it creates a new row.

I’m looking for a way to combine my 1200 single insert queries into fewer queries to improve performance, but I’m not sure how I can do that since I am checking if it exists first.

I attempted to use:
system.db.beginNamedQueryTransaction(database, isolationLevel, timeout)

But it seems to have no impact on performance in my case, I was hoping that it would combine all of my queries into one and then commit them all at once, but it seems this is not the case.

Any advice is appreciated.

PGriffith · March 8, 2021, 10:22pm

You need to ensure all 1200 rows are unique? Or do they share some common feature that must be unique?
Either way, you could probably batch the logic to check for validity to check for all 1200 elements, then run a multi-value insert for the 1200 rows.
What DB are you using?
Postgres, for instance, has an ON CONFLICT clause for an insert statement, so you could attempt all 1200 rows optimistically (in a transaction) then rollback if any of them fail:
https://www.postgresql.org/docs/9.5/sql-insert.html#SQL-ON-CONFLICT

BinaryBush · March 8, 2021, 10:44pm

I am using MSSQL, All 1200 rows are unique, they are composed of status information for a machine, each row being a different machine.

I tried to encapsule this in a transaction using:
txNumber = system.db.beginNamedQueryTransaction(database, isolationLevel, timeout)
system.db.runNamedQuery("updateMachineInfo", params, txNumber)
system.db.commitTransaction(txNumber)
system.db.closeTransaction(txNumber)

But it unfortunately made little to no difference in performance. I think I am going to have to ditch using a named query and dynamically create a giant insert query, however it’s a big task to do that, so I was looking for a more optimal way.

PGriffith · March 8, 2021, 10:57pm

Is it always the same number of rows/elements? What’s the failure behavior if there is a duplicate?
If it’s a fixed number of rows, you could just write out one massive named query that takes all 1200 * n parameters, which would then run as a single statement on the server:

pturmel · March 8, 2021, 11:11pm

Note that the point of the ON CONFLICT statement is to not have to roll back. Each row that triggers the conflict gets alternate handling within the same operation.

BinaryBush · March 8, 2021, 11:19pm

Unfortunately it’s only approximately 1200 rows, it will grow as we add machines. Duplicates will not cause failure necessarily, however the plan is to use this as a status database, updating every 60 seconds, so the size of the database would get out of hand quickly if I just use inserts and not updates.

BinaryBush · March 8, 2021, 11:23pm

I think I can get around the issue of having to check if something exists or not simply by deleting all rows in the table before inserting new ones, since the values update every minute from a different data source. This should allow me to use a single insert statement and then a list of values, since they will all contain the same elements. For example:

INSERT INTO table_name (column_list)
VALUES
    (value_list_1),
    (value_list_2),
    ...
    (value_list_n);

pturmel · March 8, 2021, 11:27pm

You should reconsider using a database as a store of status. Use the WebDev module to expose a status API that just reads current tag values. Reading latest OPC values from gateway memory and pushing through an HTTP(S) pipe is far more efficient than pushing status through a DB.

BinaryBush · March 8, 2021, 11:36pm

Unfortunately the data does not come from tags, the data itself comes from a web service call, this is the issue I am attempting to solve. The original data source is not hosted on-site, queries to this data source take a long time to process and are metered. The idea was to use one big web service call to get all of the data in one go, and then write it to a local database for faster queries that are not metered.

I could perhaps write the data to a tag structure instead and pursue the API method you mentioned, I will have to look into that and if my gateway has the WebDev module.

peter · March 8, 2021, 11:53pm

Given that this is machine status data, and you know what machines you are connecting, can you split the problem into two parts?

A task that pre-computes the unique parts of the machine status and does an insert to ensure that a row exists (this only needs to run every so often - just fast enough that it establishes rows before a machine comes online
The update task that now doesn’t have to check for a row’s existence, so it just does an update (and fails until task #1 has run)

This minimizes the overall DB effort and should help speed things up.

ryanjmclaughlin · March 9, 2021, 1:00am

I echo @pturmel if this is just status datathen you could convert it to tags. Getting the data doesn’t change, just where your writing it to.

If you want to stick with the DB, using an UPSERT query can allow you to check for existing entries and update/insert all within the same query. You could also offload the DB queries to Store and Forward, but that somewhat just pushes the problem around.

peter · March 9, 2021, 1:07am

The OP is using MS SQL, which doesn’t have a direct Upsert (unlike say Postgresql), so the only option is a manual test and then either update or insert - which is where they are already starting from.

As for making the data tags, I would agree, but I myself just delivered a system where the real time status data was written into a specific table in an MSSQL - because that is what the customer explicitly asked for

ryanjmclaughlin · March 9, 2021, 1:27am

Correct, there is not UPSERT command in MSSQL, but you can certainly do it using the MERGE statement do do everything in a single query.

Plenty of examples out there, this one isn’t too bad SQL Server MERGE to insert, update and delete at the same time (mssqltips.com)

BinaryBush · March 9, 2021, 1:44am

Thanks guys for the help, my current plan of action is to look into the feasibility of writing this data to a tag set and maybe using an HTTP API for it, and if I decide against that for any reason, I like Peter’s option of splitting the update from the original insert.

peter · March 9, 2021, 11:55am

I’ve previous seen that website about the merge command but one thing that always troubled about it (and various similar websites) is that they only show merging data from one table to another, and not from literal values to a table. So I just decided to search for that very thing and I just encountered SQL Server MERGE without a source table which IMHO is a better explanation of how to use merge as an upsert replacement. From that link:

MERGE TARGET_TABLE AS I
USING (VALUES ('VALUE1','VALUE2')) as s(COL1,COL2)
ON I.COL1 = s.COL1
WHEN MATCHED THEN
...

However I wonder at the query plan that is generated by a merge and how complex it is compared to doing a simple insert/update. That would be an interesting question to answer.