Historical Tags Retrieval

jconnor34 · November 28, 2011, 11:04pm

I have some reports that require some more advanced filtering that I can’t seem to get from the current tools. I have several historical tags that gather data during the day. I need to get the max value for each tag daily, within a time range for that day and be able to aggregate them in daily,weekly,monthly,quarterly,yearly. The RowSelector component looked promising, but there is no way to bind the SHIFT start and end times to an external tag. You are required to hard code the times, which doesn’t work for my application.

I can use the python scripting to get data for a date range, but it doesn’t filter by shifts and doesn’t allow me to just get the maximum value per day.

Are there any sample scripts for getting historical data in a standard SQL query, which would allow the type of filtering that I would need?

jconnor34 · November 29, 2011, 2:28pm

Maybe I should rephrase my question. I want to be able to use standard SQL to get a dataset from a historical tag. This would be on pre 7.3 versions. When I look at the tables for the historical tags I do not understand the structure and how to do a valid SQL statement to get the data from the tables. It appears that there is some partitioning of tables so my question would be how all that would work.

For example, say I have a historical tag TagA and want to find the maximum values for each day between a time range. Normally the SQL would be something like:

SELECT DATE(t_stamp), MAX(TagA) FROM historicalTable WHERE TIME(t_stamp) IS BETWEEN time1 and time2

How would I accomplish a similar type of query against the historical tables including cases that span partitions?

jconnor34 · November 29, 2011, 5:00pm

I have figured out the following on my own and am almost to the point of getting what I need. I found that the data is stored in partitions named sqlt_data_X_YEAR_MONTH. The tagid column in these tables is an integer value that is defined in the sqlth_te table. All the timestamps are in the UNIX epoch format so here is a SQL statement that will get the max value of a tag daily, filtered by a shift:

SELECT MAX(intValue),FROM_UNIXTIME(t_stamp/1000) FROM sqlt_data_1_2011_11
WHERE tagid=(SELECT id FROM sqlth_te WHERE tagpath=‘YOUR_TAG_PATH_HERE’)
AND TIME(FROM_UNIXTIME(t_stamp/1000))>=‘08:00:00’
AND TIME(FROM_UNIXTIME(t_stamp/1000))<=‘15:00:00’
GROUP BY DATE(FROM_UNIXTIME(t_stamp/1000))

This will return a dataset with the max value of the tag for each day within the shift defined. The shift times can be bound to properties or tags to make it dynamic.

The only part remaining is the FROM sqlt_data…

I am assuming that the data could be partitioned over several months or even several partitions for a single month. Based on the date range I am assuming that the table names could be discovered from thje sqlth_partitions table, but I haven’t got that far yet.

Colby.Clegg · November 29, 2011, 7:01pm

Hi,

Yes, as you’re discovering, we didn’t exactly have direct querying in mind when we built the tables, though the flip side of that is that the query interface in the software doesn’t currently let you do things like this easily. That said, you might find it easier to do this in scripting, by querying the raw data, and doing the processing yourself. This would let you ignore the partitions, and the table structure in general.

Going forward with SQL queries, you’ve identified the main parts. Yes, the partitions table maps tables to time ranges. The sqlth_te table maps tags to ids, and the data just uses the id. I don’t think it’s possible to dynamically figure out which tables need to be queried, and then join them together, though you might be able to do this in a stored procedure, or maybe by using a view that unioned them all together (I’m not sure about the performance of doing this, I’m afraid it wouldn’t be good).

Anyhow, the partition size is arbitrary, and can be made whatever you want. You could just change it to be large (5 years) and not worry about partitions. In fact, I suppose there should be an option to just turn them off.

Sticking with what you currently have, I want to offer 2 suggestions in regard to query performance:

Remove the sub query, these are generally less efficient than joins.
Change the way you limit the time, to optimize index usage.
In other words…

SELECT MAX(d.intValue),FROM_UNIXTIME(d.t_stamp/1000) FROM sqlt_data_1_2011_11 d, sqlth_te t
WHERE d.tagid=t.id and t.tagpath='YOUR_TAG_PATH_HERE'
AND d.t_stamp between unix_timestamp(date_add(date(now()), interval 8 hour) and unix_timestamp(date_add(date(now()),interval 15 hour) GROUP BY FROM_UNIXTIME(d.t_stamp/1000)

See, by leaving t_stamp alone, the database doesn’t need to do a full table scan (at least, it doesn’t need to run those functions against each value in the table). Instead, it runs the functions once, and then just compares the raw db value to them, which more efficient.

Now that I look at it, I don’t think the group by is correct/necessary, and for what you really want to do you’ll have to adjust the where clause a bit, but I hope this gets you a bit closer to what you want to do. Maybe some friendly script ninja will show up with an example of how to do it in scripting…

Regards,

jconnor34 · November 29, 2011, 8:11pm

In the table sqlt_data_X_YEAR_MONTH

What does the X represent?

Colby.Clegg · November 29, 2011, 10:28pm

That value refers to the driver id, defined under sqlth_drv. In addition to being partitioned based on time, multiple systems storing SQLTags History data to the same data will be partitioned into their own tables. So, if you added another Ignition and pointed it at the same DB, you would end up with sqlt_data_2_… and so on.

Regards,

jconnor34 · November 29, 2011, 10:58pm

Ok, thanks.

If I was to change the Partitioning from Monthly to Yearly, would Ignition take the current monthly tables and convert them to yearly tables? I am assuming that is not the case…

Colby.Clegg · November 29, 2011, 11:05pm

No, it won’t do it automatically. What it will do is “end” the current partition immediately, and create a new partition for the rest of the year.

Here’s what you could do:

Change the partition to be a year. Wait for the new table to be created.
Transfer old data using a query like:

insert into sqlt_new_table select * from sqlt_data_1_2011_11

Delete the old partition rows from sqlth_partitions.
Update the start time of the new partition to cover the old data. You could just set it to 0 if you wanted, or do “update sqlth_partitions set start_time=(select min(t_stamp) from data)…”

Regards,

jconnor34 · November 30, 2011, 12:51am

SELECT MAX(d.intValue),FROM_UNIXTIME(d.t_stamp/1000) FROM sqlt_data_1_2011_11 d, sqlth_te t WHERE d.tagid=t.id and t.tagpath='YOUR_TAG_PATH_HERE' AND d.t_stamp between unix_timestamp(date_add(date(now()), interval 8 hour) and unix_timestamp(date_add(date(now()),interval 15 hour) GROUP BY FROM_UNIXTIME(d.t_stamp/1000)

The above query that you recommended does not make sense to me. It is using dateAdd to Now() which would just return the max for a single day…right? I need the max value for each day, within the shift times, for a selected date range which need to be dynamic and are down to the minute.

I have used your recommendation of using a JOIN instead of Sub-Query, but the time improvement did not make much sense.

Thanks for your help…