queryHistory() functionality

I am using queryHistory() in my module to retrieve historical data, then preprocess the data in a specific way before using it elsewhere in the module. I have found that preprocessing after getting the query return is actually faster than formatting my query params to get the same result. However, by using the preprocessing step between the query and the usage of the data, I am hobbling the memory-saving abilities of the BasicStreamingDataset. Is there a way to peek at the underlying structure of queryHistory() to write my own stripped version? I have a feeling that there might be a way to combine my methodology and Inductive’s to make a faster data retrieval.

This is all in the name of reducing query time, as I am usually querying for 3-24 hours worth of historical data for 50-700 tags at a time, which can be quite intense. For instance, a query for 3 hours of data on 266 tags takes about 40 seconds to complete.

Ok I can see that looking at the internals of an Inductive method is a no-go. As replacement, could someone help me understand the threading of the Ignition Client? I have multithreaded the query and drastically improved performance (10 minute query to less than a minute), but it is still not as fast as I want it to be. How is the RPC call to the gateway handled from a threading perspective? Why do some threads take exponentially longer to execute if all threads have the same workload?

Especially in 8.1, but to a lesser extent in 8.3, the answer is "it's complicated", and it depends on where the RPC call is originating from.

If you're specifically referring to automatic tag history fetches from a binding in Vision, that's a different situation than someone running history queries from scripting, for instance.

In 8.1, no matter the origin, RPC calls eventually hit com.inductiveautomation.ignition.client.gateway_interface.GatewayInterface (which usually does not have any concurrent request limit; there's a rarely used project property that can change that) which creates a new HttpUrlConnection back to the gateway, where the Gateway servlet handles the incoming XML formatted POST and delegates to the appropriate RPC handler. Those incoming requests are also not meaningfully rate limited, at least by Ignition code - each request that comes in is served up as readily as Jetty can consume them.

I am running queryHistory() from within a module (RPC call technically originates from client scope, but eventually calls the gateway method) to retrieve large amounts of data. In your opinion, what would be the fastest query approach? Multi-threading seems to be the road to travel, but methodology of multi-threading can get complicated (query chunk size, chunk number, thread pool count, etc).

From your previous answer, it seems that the bottleneck is truly the speed at which the gateway can retrieve data from the database and send it back to the client. Would I be correct in that statement?

There's a lot to unpack here, and without seeing your whole system holistically it's really hard to answer in more than vague terms, but I'll give it a shot.

First off, in the general case, you can assume that network operations are going to be dramatically slower than operations on the same machine, which are themselves going to be dramatically slower than operations in the same process.

This is true to such a degree that if you're doing multiple network operations in a sense it doesn't matter how efficient your in-process code is going. This is where multithreading can make sense - if you must make multiple network operations, doing them in parallel instead of in series is naturally going to reduce wall clock time.

Now, speaking of 8.1 (and prior) versions specifically, there's memory efficiency as a target axis.
As you describe, any Ignition operation involving the database is actually being done by the gateway, even if it's on behalf of a particular client.
Thus, there is always time ineffiency built in; the request flows from the client to the database and back. This is unavoidable.
Client -> Gateway -> Database -> Gateway -> Client.

The only benefit of StreamingDatasetWriter is to improve memory efficiency.
Because the gateway is purely serving as a 'proxy' in this case, there's no need for it to buffer a complete response in memory before immediately discarding it to drain down the write to the client.
The queryHistory API is designed around this plan, hence StreamingDatasetWriter, and the concrete implementation designed for use in RPC, com.inductiveautomation.ignition.gateway.servlets.gateway.GatewayResponseStreamingDatasetWriter, which is unfortunately not public API in 8.1.

However, all of this tangential discussion, while interesting, is almost completely moot.
For one, the way RPC works changed entirely in 8.3 making it much easier to immediately start streaming results of any kind down to the client (improving time efficiency and not just memory), thus somewhat circumventing the limited nature of streaming datasets.

But the most important thing is:

That's not that much data.
That it takes 40 seconds indicates a pathologic issue in your database/historian. I've seen historians with millions of tags, returning hundreds of thousands of data points in each query, much faster than tens of seconds. If the query itself is slow, then the thing to focus on is your database/historian, and all the rest of the optimization in the world is moot.

1 Like

This is very interesting, and thank you for your detailed response. What kinds of things would I be able to optimize in the database/historian side? Would they be more related to historical configuration on tags or actual database configuration settings in MSSQL?

Might just be missing/incorrect indexing; see this thread where someone cut their 1.5 minute history query times down dramatically:

I’ll take a look and report back.

No luck with changing the indexing/primary keys. 1 hour of data (3600 rows) for 538 tags still takes 15 seconds to retrieve. I am using MSSQL by the way.

I'm with Paul. 3600 rows should return nearly instantly. Talk to whoever is running your MSSQL server. In my last job, I was playing with an MSSQL job that was playing with 4billion rows in the return, and that only took a couple minutes. And SQL was doing a lot of crunching to generate that on the fly as it was doing joins, pivots, and calcs all at the same time.

Give SQL enough horsepower, and it can fly.

2 Likes

We ran some tests on the actual server and were getting 400,000 rows back in 3-4 seconds. I also compared the difference between using queryHistory() in the module and queryTagHistory() in a button script natively, and queryTagHistory() was faster by almost 4 fold. Is there something inherently slow about RPC calls from a module? Does system.tag.queryTagHistory() use a different communication protocol than RPC?

queryTagHistory from the client side is using the same RPC protocol modules have access to.

You need to figure out where exactly in the process things are slow - per your last you're saying it's not the DB, which disagrees with the earlier screenshot you posted with 30 seconds of disk IO when querying.
Figuring out the actual source of the problem is going to be the first useful step to troubleshooting, otherwise you're just throwing stuff at the wall to see what sticks.