Performance problems on FactorySQL start/restart

AlThePal · April 15, 2009, 1:36pm

I’ve noticed when I start a customer’s system that it was taking a long time to settle down to a steady state. I was able to replicate this behaviour by stopping and then restarting FactorySQL. When I did this, I noticed the following:

The processor usage of MySQL would jump from 0-3% of processor usage to 51-54%.
Calculated SQLTag points would not have a valid value.
Any standing alarms were only added to the alarm list very slowly.
The system responded very sluggishly.

Once all alarms were added to the alarm list (which took 4 or 5 minutes), processor usage would drop back down to normal, whether or not the alarms were acknowledged.

It would appear that the system is having problems coping with standing alarms when the FactorySQL service starts. Any thoughts?

Al

Colby.Clegg · April 15, 2009, 3:43pm

It definitely sounds like FactorySQL is doing something that is bottlenecking at the database- the problem could be with how FactorySQL is doing it, or just an issue with the tables that could be resolved with better indexes.

The thing to do would be to figure out which queries are taking so long. MySQL has some new analyzer tools that might do this well, but I haven’t really played with them yet. Instead, what I normally do is open the MySQL Administrator, and then go to “Server Connections” and look at the threads there. You’ll see the executing queries under the “Info” column. While periodically hitting the “refresh” button in the lower right, you’ll see a variety of queries pop in and out. Any normal query should only take <100 milliseconds to run, so most of time you’re just getting lucky to catch a particular query in the act. However, in a situation where there are many queries being executed or slow queries, it’s usually immediately apparent which they are. They’re the ones that stay up while you hit refresh, taking seconds to run. Try to see if anything fits this description, and report back what the queries are.

Also, how many SQLTags do you have? Reading and then updating the value of every tag is one of the big operations that occur during start up, but usually it doesn’t cause this much of a problem.

Regards,

AlThePal · April 15, 2009, 9:03pm

Thanks Colby, I’ll try looking for slow queries next time I’m on site, probably next week.

As for SQLTag numbers: 611 raw inputs (straight from the PLC) and 81 calculated values (using expressions).

Al

Colby.Clegg · April 15, 2009, 9:09pm

Ok, so the number of SQLTags really shouldn’t be a factor. Is this the same machine that you set up the audio alert feature on? That feature queries the entire alert log for entries where clear_time is null, so that could add a lot of overhead if the log table was large and clear_time wasn’t indexed…

but take a look at the thread list and see what you find.

Regards,

chris_d_sanders · April 16, 2009, 2:12am

Colby,

I am having a SIMILAR issue with sqltags. We have ROUGHLY 3400 to 3500 Alarm tags in KEP and all of them are SQL tags as well. Recently we added all of these tags and ALSO moved the installation of FPMI, FSQL, KEP and MySQL to another harddrive LOCAL to the machine (ie… not the physical harddrive with the operating system). Before the transfer and before the massive SQLtag add, i would say we had 700 sqltags, and ONCE in a blue moon they would go stale.

First:
what is a good rule of thumb for number of tags and the best way to implement them?

Second:
What is the best SET up as far as where the gateway, fsql, kep and mysql are located and set up?

-Chris

Colby.Clegg · April 16, 2009, 5:07pm

Well, when it comes to the raw number of tags, the SQLTags system should be able to easily handle a large number- though I would certain expect some amount of extra load/time to stabilize at startup due to the fact that all opc tags need to be written, since their values will have changed. (Note: right now opc tags are initialized with “unknown” value, not the previous value- thus they always change on startup. This is probably something that we could change to reduce the amount of work to do when they’re loaded).

A few suggestions:

Most importantly, break up tags in to multiple scan classes, whose run rates are appropriate for the tags. This can break up/reduce the work and minimize the bouncing between stale/not stale. Writing to the database is done inside the scan class execution, so writing many tags at a time can lead to the scan class going “stale”.
Check/change your database engine (with mysql). We’ve found that in general, MyISAM is more efficient for sqltags.

As for your second question about the layout of the machines: if you have multiple machines, separating the database can generally let you give it more resources and improve performance. I would always keep FactorySQL on the same machine as the OPC server, and FactoryPMI could go on either- though generally I think people prefer to get the database machine isolated and dedicated to that task.

Hope this helps,

chris_d_sanders · April 17, 2009, 3:21pm

When you say "break up tags" do you mean all ALARMS should be the same scan class and say... some temperature i want to see every 10 sec would be another? We have all alarms set at 1000 ms Low Rate and 10,000 ms Stale Timeout and Mode "Direct". Temperatures for overview screens are set at 10,000 ms Low Rate and 5,000 ms High Rate and 20,000 ms Stale Timeout and Mode "Leased". Is this similar to what you yourself would expect to see?

Colby.Clegg · April 17, 2009, 3:58pm

Sure, that’s fine. I mainly meant to not have all of the tags just in one scan class, so something like what you’ve done would be fine. The only thing I would point out would be that all of the tags in a leased scan class will be executed at the faster rate if even just one tag is on the screen- the lease occurs on the scan class level, not the tag. So, if there are many tags, it might be worth breaking them up into a number of similar leased scan classes.

For debugging purposes, if you’re having problems with tags going in and out of STALE, try taking a look at the sqlt_sci table. This will show you values of last_exec, next_exec that PMI should be looking at to determine staleness, along with other information such as the last execution duration. If your last/next exec are being updated correctly, and the execution duration isn’t unreasonable, it could indicate a different problem. Either way we should be able to narrow it down a bit.

Regards,