SOE data from SQLT history tables

Robert · February 8, 2011, 11:15pm

Note all SQL in this post works for Postgres, your mileage may vary

So what was that big table I referred to here? Why, the history table.

My problem: How to get the SOE data from the history table. Colby and the boys created the tall format for the system.tag.queryTagHistory table for me . This worked well for smaller systems. I broke it recently by passing in a large number of tags. In my industry (electric utility), the customer wants to see every ‘event’ (status change) in the system at a specified time to help with fault cause analysis. To do this I had to feed every status tag in the system to system.tag.queryTagHistory. Once a got few 1000 tags…

My solution is this post. Please review the previous post for how to call this and update the table component. This post deals with getting the data only.

step one:
call [code]def get_SOE():
import app
import system

soeWindow = system.gui.getWindow('SOELog')
table = soeWindow.rootContainer.getComponent('SOEs').getComponent('SOETable')
timeFilter = app.history.filterTime()
pathFilter = app.history.filterPath()
partitions = app.history.getPartitions(timeFilter[1],timeFilter[2])
sql = app.history.getQuery(partitions,timeFilter,pathFilter)
newData = system.db.runQuery(sql, 'HistoryDB')

[/code]
Not too much happening here. This function just controls it all. The details are below:

[code]def filterTime():
import system

soeWindow = system.gui.getWindow('SOELog')
s = soeWindow.rootContainer.getComponent('SOEs')
options = s.getComponent('Options')

startDate = options.getComponent('StartDate').date.getTime()
endDate = options.getComponent('EndDate').date.getTime()
timeFilter = " AND sd.t_stamp > %ld AND sd.t_stamp < %ld "%(startDate,endDate) 
return [timeFilter,startDate,endDate]

[/code]
Here we are getting the user selected start and end times from a couple of calendar dropdown components. The .getTime() function returns the selected dates as milliseconds since the epoch(Jan 1,1970). Conveniently, this is the same format the the history tables use to store time . We then format an SQL snippet and return all three bits. (we need the timestamps later)

[code]def filterPath():
import system

soeWindow = system.gui.getWindow('SOELog')
s = soeWindow.rootContainer.getComponent('SOEs')
options = s.getComponent('Options')

pathfilter = ""
if options.getComponent('Device').selected == 1:
    pathfilter = options.getComponent('Text Field').text
if options.getComponent('Point').selected == 1:
    pathfilter += "%/%"+options.getComponent('Text Field 1').text
pathFilter = " AND st.tagpath LIKE '%" + pathfilter + "%'"
return pathFilter

[/code]
The user has the option of filtering the tags to return a subset. This function creates the required SQL snippet to do this. If no filtering is selected, we return an empty string. This allows us to blindly include the path filter into the final SQL.

[code]def getPartitions(startTime,endTime):
import system

return system.db.runQuery("SELECT pname from sqlth_partitions \
                   WHERE (start_time < %d and %d < end_time)\
                   OR (start_time < %d and %d < end_time)"%(startTime,startTime,endTime,endTime),'HistoryDB')

[/code]
The history data is partitioned into separate tables. This is a good thing because you don’t want your table to have more than a few million rows in them. In this case it also means that we can’t just query a single table to get our data. This piece of code checks the sqlth_partitions table to find out what tables our data is stored in. This is based on the date range set by the user. (I told you we need those timestamps later, now it’s later). This function returns a dataset with a list of tablenames.

[code]def getQuery(partitions,timeFilter,pathFilter):
import system

sql =""
for partition in partitions:
    if sql != "":
        sql += " UNION "
    sql += "SELECT st.tagpath as \"Point\", \
               sd.intvalue AS \"State\",\
              CASE sd.dataintegrity WHEN 192 THEN 'Good' WHEN 32 THEN 'Not Reported' WHEN 24 THEN 'Comm Failed' ELSE 'BAD' END AS \"Quality\",\
              TIMESTAMP WITH TIME ZONE 'epoch' + sd.t_stamp  * INTERVAL '1 millisecond' || ' ' AS \"Timestamp\"\
              FROM sqlth_te st, "+partition[0]+" sd \
              WHERE st.id = sd.tagid AND st.datatype = 0 "+timeFilter[0] + pathFilter
sql += " ORDER BY \"Timestamp\" DESC limit 10000"
return sql

[/code]
Time to put it all together. We loop though our list of partitions and for every one of them we have to build a select statement. Then we have to UNION the select statements together so we get a single dataset back. I like making the database do as much work for me as possible. After all, it’s written by a bunch of developers that are a whole lot smarter then me at making things really fast .

Let’s break this one down a bit:

if sql != "": sql += " UNION "
we only want the UNION keyword to be in between the selects. This does that by ensuring that sql has something in it (so we don’t put it at the front) and by putting it at the top of the loop, we make sure that a select will follow it (so we don’t put it and the end).

CASE sd.dataintegrity WHEN 192 THEN 'Good' WHEN 32 THEN 'Not Reported' WHEN 24 THEN 'Comm Failed' ELSE 'BAD' END AS \"Quality\",\

These line changes the quality code into readable English. Note that I am only concerned with a couple of values. Everything else is just BAD. You could also get the quality code as a numeric value which you can use to colour rows.

TIMESTAMP WITH TIME ZONE 'epoch' + sd.t_stamp  * INTERVAL '1 millisecond' || ' ' AS \"Timestamp\"\

This is postgres for “show me the timestamp as a written date”.

WHERE st.id = sd.tagid AND st.datatype = 0 "+timeFilter[0] + pathFilter

The where clause restricts the dataset to non-analog values. (if you want those, use the graph component). We also add in our user requested time and tag filters we created above.

sql += " ORDER BY \"Timestamp\" DESC limit 10000"

We finish off by sorting the returned data by time and limiting the result to a reasonable amount. If it does not return the range the user wants, they can change the dates and try again.

Using the above, I can search 14,000,000 rows of data in two partitions for 1200 tags and get back 10,000 rows in less then 5 seconds. This is with a standard db install with no tuning.

diat150 · February 9, 2011, 7:35pm

[quote=“Robert”]

Using the above, I can search 14,000,000 rows of data in two partitions for 1200 tags and get back 10,000 rows in less then 5 seconds. This is with a standard db install with no tuning.[/quote]

thats impressive.

Robert · February 9, 2011, 8:11pm

[quote=“diat150”]
thats impressive.[/quote]

That’s just our demo system. Otherwise I would be tuning the database.

nathan · February 10, 2011, 3:55am

That is impressive! I’m curious how much improvement a solid state drive (SSD) would offer.

Colby.Clegg · February 10, 2011, 4:55pm

Very nice. One thing that might also help boost performance and reduce the complexity of your query process is to look into using the database’s partitioning facilities. We decided to do our own based on time in order to offer some performance gain that would work for any db- but pretty much all of the big db systems offer some sort of partitioning mechanisms built in.

For example, you could set Ignitions partition size to 1 year (or longer, I don’t think there’s currently a way to say “off, everything in one table”), and then modify the table using the db’s syntax to have it partition into smaller segments. Doing this, it always looks like one table to you, but the db stores and searches it separately.

Robert · February 10, 2011, 6:28pm

Hi Colby,

I have looked into database partitioning before. Postgres supports partitioning through inheritance. Basically you create a master table (which stays empty) and a number of child tables. Each child table will have a constraint that limits new data to a range (usually a date) for that partition. It gets more complex because you have to create the child table and modify the constraint for each of them. As I said, if this was a production system I’d be looking into this sort of thing.

Mysql partitioning is still fairly new. (I think it was interduced in ver 5)

It’s also the sort of thing that you really want a DBA on hand to help set up.(cause you just tell him to do it and not worry about it )

For more information see:
Postgres
Mysql

Robert · February 10, 2011, 6:30pm

[quote=“Colby.Clegg”]
For example, you could set Ignitions partition size to 1 year (or longer, I don’t think there’s currently a way to say “off, everything in one table”)[/quote]

Sounds like another feature request

DennisWomack · February 11, 2011, 4:22pm

SOE data?

Robert · February 11, 2011, 4:32pm

Sequence Of Events

Basically in what order did things happen.

DennisWomack · February 11, 2011, 5:01pm

Thanks. I just had not been exposed to the acronym.

In your industry, very necessary for root cause analysis.

Did the steam turbine really run backwards?

Dennis

Robert · February 11, 2011, 10:36pm

Robert · February 15, 2011, 5:02pm

So am I! Got one I can play with?