7.3.1 and External Database Driver

todd.be · October 31, 2011, 10:57pm

I tried updating to 7.3.1 in our production systems and encountered issues with the upgrade so I rolled back to the previous release.

What happened was all the subscriptions for our external database driver started behaving erratically and becoming stale prematurely. Like immediately after the sci table was updated. Both hirate and lorate are set to 2000 and the database is being updated that frequently. It sets the sqlt_sci lastexec to now and nextexec to 3500 ms in the future in case there is a delay in updating the database. The statetimeout is set to 15000 ms.

I guess I expect that the tags of that scanclass will become stale normally after lastexec + 2000 + 15000 < now (or nextexec + 15000 < now).

Nothing has changed in our settings for the database and our customer driver works fine with 7.2.8 and before. On my normal development box I use the same settings and do not have any problems.

I’ve cloned the database and HMI to another box and have a python script to simulate the external db driver and it also has a similar problem as the production box. The script works fine with 7.2.8 and earlier versions.

Has anything changed that I need to be aware of regarding the updates that occurred in 7.3?

Greg.Buehler · November 1, 2011, 1:17pm

Any chance you updated Java too?

todd.be · November 1, 2011, 3:18pm

Definitely did not update the production server since we are comfortable with the release we have even if it is ancient.

Looks like the dev machine that works is update 1.6.0-26. The one that does not work is 1.6.0-10 but I did update the second test machine from 1.6.0-10 to 1.6.0-26 for comparison and it behaves the same. So whatever we are seeing is not related to version of java.

Colby.Clegg · November 7, 2011, 5:17pm

Hi,

Sorry I didn’t reply sooner. There were some minor changes between 7.2 and 7.3, but nothing too dramatic. The method for calculating the staleness has stayed the same, as has the query used, but one minor change was how we process the query results. The new way might be a bit less efficient, but I can’t imagine it making that much of a difference (in a test setup I have, it’s taking about 200ms for 3000 scanclasses).

I’ve noticed something, though, that was true in 7.2 as well- the stale timeout setting isn’t being utilized correctly. The default rate of 10 seconds is being used. Also, this is the way staleness is being calculated right now:

Get current DB time
Load records from SCI table.
Evaluate (db_now-next_exec > stale_timeout)
Update running scan classes / tags.

Since your case is pretty unique with so many tags/scan classes, it’s a bit hard to say how the time involved in reading values and other execution could be affecting things. Ultimately, it seems like at some point “next_exec” is failing behind by more than 10 seconds, though if you’re updating it every 2 seconds, that seems unlikely.

If it’s easy enough to try, you could try setting your next_exec further in the future. The field is only really used for stale detection, so it’s ok if it’s not exactly accurate.

On our side, we should get the stale timeout to work correctly, and make sure the logging is in place to see how long the different operations are taking. I’ll let you know if anything else comes to mind to check.

Regards,

todd.be · November 7, 2011, 7:07pm

Thanks for finally replying.

I normally run next_exec about 50% longer than I expect to execute as I didn’t want variations in database commit/select times to potentially interfere. I have updated it to 5x the actual time and don’t really notice any difference in behavior so presumably its something else.

todd.be · November 11, 2011, 10:30pm

We noticed that if we change the sqlt_sc mode to “not in (0, 1, 2)” then the driver seems to work as expected in at least one machine. We are not really crazy about making that change since it could break with future updates.

Would it be an unreasonable change to set the mode to something other than 0,1,2?

Colby.Clegg · November 14, 2011, 5:51pm

Hi,

As it currently is, by setting the mode to an invalid value, you’re preventing the scan class from being loaded. That in turn prevents the scan class instance information from being used. When the tags are first loaded, they’ll be marked as “config error”, but when the value gets updated from the database, it will be applied.

I would say it’s an OK work around for right now, though of course not ideal. The methodology of how tags are loaded and executed won’t change any time soon, so a minor update here or there won’t break it. Of course, I’d like to try to figure out why the scan classes are going stale to begin with, because that whole system does play an important role - if your driving app stops working for whatever reason, you won’t currently know about it (unless through a different means).

I intend to put additional logging into 7.3.2 to help us figure out exactly why the system thinks your scan classes are going stale/unstale in continuation.

Regards,

todd.be · November 15, 2011, 5:04pm

Actually I think we tested with the invalid scan class mode and it did detect when our application stopped updating the sci table and correctly invalidated the tags. So it does seems to work as we expect and we would not be giving false indications of good quality due to the driving app failing to update the tables. It also removes the stale status appropriately when the app resumes.

Since it looks like we will very soon push past our updated memory limit we are considering pushing this into production in any case.

Colby.Clegg · November 15, 2011, 6:11pm

This seems like a silly question, but have you verified that you don’t accidentally have multiple entries in the _sc table with the same name? Or maybe somehow the same id? The fact that it continues to work even when the scan class instance shouldn’t be getting loaded makes me think there’s another one.

Regards,

todd.be · November 16, 2011, 3:14am

Ok we figured it out. And I’m red faced about it.

We changed the default time value from DateTime to Timestamp when we first were testing because that is what we typically use in our databases. The problem with this is that mysql defaults the default value to current_timestamp. Any modification to the table that does not explicitly update the column in question will result in that column being updated to current_timestamp. What is happening is that you are setting leaseexpiration on the table and as a result it sets configchanged to now in the database. It looks like you have added code to reload the scanclass if configchange is updated in 7.3 and mark all tags as bad. So this is ultimately user error I guess.

Not really crazy about this invalidation behavior but can live with configchange causing tags to stale out even if the scanclass hasn’t really change substantially. Better would be a diff from current to database but thats more complex and probably more likely to have unexpected consequences.

Colby.Clegg · November 16, 2011, 4:10am

Ahh… I’m glad you figured it out. Yes, hitting config change definitely reloads the scan class, but I’ll have to look into what changed exactly. You’re right that it should definitely be a bit smarter about how changes are applied. I suspect right now the tags are being transferred to the new scan class (and reinitialized in the process) no matter what. There certainly were changes in how the life cycle of objects are managed, so its likely that while changes were previously applied to the running scan class, they’re now creating a new instance. We should be able to get it cleaned up fairly easily- though I would definitely recommend changing the db so that config change isn’t modified each time. With so many scan classes, it’s bound to be inefficient to read them all each time.

Regards,