Retrieving "Periodic" Data

MickeyBob · June 26, 2007, 9:22pm

I’m considering FactorySQL for a project but have questions related to data retrieval.

I’ve been told that the data is only written to a DB table when it changes. I understand the advantages of this method in that it greatly minimizes the number of writes and minimizes the amount of data that has to be stored.

However, my concern is extracting the data from the database. For example, assume I have an generic application (i.e., not FactoryPMI) that generates reports based on periodic data, i.e., it expects values at even periods of time - say every 5 minutes for a 24 hour period.

How easy is it to get the data out of the database in this format? Would I have to rely on custom/complicated queries and/or stored procedures to extract the data?

Alternatively, is there a way to configure FactorySQL to write data periodically, even if the data hasn’t changed?

Your insights are greatly appreciated.

Colby.Clegg · June 26, 2007, 10:43pm

Hey-

Actually, it is the normal behavior to log all the time, even when nothing’s changed. If you have a group set to execute every 5 minutes, and it’s set to “Insert new record”, it will insert that record with the current data every 5 minutes.

Going to the trigger tab and selecting the “Async” mode would cause it to only log changes.

Hope that helps,

MickeyBob · June 27, 2007, 11:35am

Colby,

Thanks for answering my question. Your answer is very beneficial and insightful.

In light of your answer, what kind of throughput can I reasonably expect? How fast can I write values to, say, a MS SQL Server? Does logging to a remote SQL Server, versus a local SQL Server, significantly impact the speed?

Thanks again!

Colby.Clegg · June 27, 2007, 4:05pm

Overall throughput is a pretty complicated subject. In general, the efficiency of factorysql depends a lot on the layout of the data (number of groups, what type of groups), in addition to the raw number of tags and the speed that you’re trying to log.

The native connector for SQL server is very efficient- usually taking just a few milliseconds for logging queries. Going to a remote server versus a local one doesn’t matter much, but of course that relies on exactly how remote it is- a different machine on the same network takes almost no extra time, but a server across the internet or a wan might be a different story. This later case is especially important when you consider that it is more likely that the connection might go down. FactorySQL will cache data, but the processes of detecting a down connection and then monitoring it to come back up is obviously more work than having it never go down.

Finally, depending on what you’re trying to accomplish (and how your data is laid out), using block groups can help improve throughput. Essentially, block groups allow you to define data together, in situations where you otherwise would have used a separate standard group for each row. Doing this allows multiple queries to be executed together as a transaction, which is way more efficient than doing them separately.

What kind of numbers are you talking about? I could throw out hypothetical situations, but since there’s so many ways to accomplish things it may help to get a rough idea of what you’re trying to accomplish.

Regards,

MickeyBob · July 1, 2007, 2:40am

Colby,

Let’s assume that I would like to store between 1500 and 2000 tags every 10 seconds or so. Of the tags, 75% are digital/discrete and the other 25% are analog values. Let’s also assume that the data is logged to a local instance of MS SQL Server running on a PC with a 3Ghz processor, 4Gb of RAM and SATA drive(s).

Is it realistic to expect FactorySQL to be able to log data this rapidly?

Is it better to define more groups with fewer tags or fewer groups with more tags?

Thanks for getting back to me.

nathan · July 1, 2007, 5:42am

With the same logging rate it’s better to have fewer groups with more tags. SQL Server 2005 has a 1024 column limit so you’ll need at least 2 groups. Microsoft’s next version lifts this restriction.
FactorySQL shouldn’t have a problem logging at this rate. You’re not dealing with that great a data transfer bandwidth. The bottleneck could come down to the OPC Server->PLC interaction. I would recommend using block data groups if you need realtime values, particularly if you expect faster updates.
You can do back of the hand calculations on how much drive space you’ll use. 2000 * 6 tags/min * 60 min/hr * 24 hr/day * 365 day/yr * 16 bits/smallint. The last depends on your data type. I get about 100 gigs/year. This is A LOT of data to go through in a useful amount of time, especially for charts/tables/reports unless you’re always looking at smaller time periods.

Depending on the application I would consider the following:

Log data this rapidly with an FSQL group that purges data that’s older than some amount of time. Create another similar group to log the same data slower to a different table.
If you’re looking to catch min/max/avg values doing something, you can use FSQL or the PLC to do so in a number of ways without storing so much data.
Log when you need to. If you have a really fast process, you may only need to log during that period
Log more slowly. I find that many end users and integrators alike estimate logging rates that are way too fast. This depends greatly on your application.

MickeyBob · July 3, 2007, 1:48am

Colby,

Thanks for the great information.

I apologize for asking a lot of questions. I have downloaded the demo and want to play with it but my work load is not currently allowing me the opportunity. I’m sure that if I worked with it a little, a lot of my questions would be answered. But in the mean time, I’d like to ask a couple of more questions and make a comment or two if I may impose on you again.

Questions:

From your last post, I gather that all tag values in a group are written as a single record, not as multiple tag/value pairs. For example, a group with 5 tags would be written as as single record in something like the following form:

timestamp, tag1value, tag2value, tag3value, tag4value, tag5value

versus 5 records of the following form:

timestamp, tag1name, tag1value
timestamp, tag2name, tag2value
timestamp, tag3name, tag3value
timestamp, tag4name, tag4value
timestamp, tag5name, tag5value

Is this correct? Or is there a way to specify how a group is written? I would assume that the latter case would be significantly slower than the former, is that correct?

What ancillary information, if any, can be written to a database along with the tag value (e.g., OPC timestamp?, OPC quality?, etc.?)

Comments:

In some cases, I prefer the latter case above because I can add new tags without having to change the table structure, assuming I can get the throughput I desire.

As you correctly pointed out, 10 second data would consume a huge amount of storage space. I would anticipate summarizing the “fast” data periodically (e.g., 5-15 minute min., max. & avg.) for long term storage and purge “old” data as necessary.

Thanks again for your help.

nathan · July 3, 2007, 2:47pm

The 2 ways that you mentioned contrast a dynamic (your first example) versus static (your second example) schema for data logging. We have chosen to implement the former for historical data logging for the following reasons:

Saves space due to 1 timestamp, OPC_quality code, etc, for each interval instead of each value. This is huge in your 2000 values every 10 seconds example.
Based on our testing with various SQL databases, SELECT queries return results much more quickly.

Disadvantage compared to static schema

Harder to mold to existing programs that expect a static schema
Inefficient to add logging values to an existing table with many rows

To answer your question. You can store the OPC quality code by checking that box in the group options. The idea is that you choose your values to log and use that group. If you have lots of data and need to log additional values, create a new FactorySQL group and write the data to a different table.

When you get to the presentation side, if you’re using FactoryPMI, you can make it pretty seamless to the end user. As an aside, FactoryPMI Easy Charts can graph data stored in a static schema (although the dynamic schema approach is a bit quicker to set up). We’ve had customers use it with data logged from RSView/RSSQL.

We have a pending feature request to add a FSQL historical group that stores data in a user defined static schema. This makes more sense with additional features such as logging on deadband value changes, etc. With the logging rates you described FactorySQL could handle the bandwidth with ease, but I question the performance that you would get with that kind of data layout and that much data. You would be talking about A LOT of rows.

Colby.Clegg · July 3, 2007, 4:09pm

Hi-

Nathan’s post above has some useful info. I just wanted to follow up to clearly address your question about data layout:

A normal group will, in fact, log exactly like you thought:
timestamp, tag1value, tag2value, tag3value, tag4value, tag5value

However, a “Block Group” will log something more or less like your second value:

0, timestamp, tag1value
1, timestamp, tag2value
2, timestamp, tag3value
3, timestamp, tag4value
4, timestamp, tag5value

So, the main difference is that it can’t actually put the tag name- however, it can put a “row id”, as well as a “block id”, so you’ll be able to query for exactly what you want. Shortly in the future we’ll support string based row ids, but right now it’s just the index of the item in the group.

Now, in terms of performance, normally running multiple queries is slower… but the block group benefits from the fact that it can do them all at once. Queries inside a transaction (that is, a block of queries together) are much faster… I’d say generally speaking you could have around 80-100 items writing vertically for about the same amount of work as is done to insert a row in a normal group.

Hope this helps,

nathan · July 3, 2007, 5:59pm

That’s right. I forgot that you could use a block group to log historical data. You also get the added benefit of block reads from the OPC server.