Transaction Group missing Data

Jonathan · November 30, 2011, 4:45pm

Last night we changed one of our transaction groups from a 60 second poll rate to a 1 second poll rate. When we came in today to look at the data we saw missing sections of data. The data seems to be some what consistent with about 2 minutes of data then 30-60 seconds of no data. The transaction group is filled with OPC tags from a Siemens PLC. Where/what should I do to trouble shoot this issue?

Thank you,

Kevin.Herron · November 30, 2011, 5:09pm

Can you post/email in the log files?

Also, take a look at the raw data in the database. Are there gaps there as well or are the gaps just appearing when the data is graphed?

Jonathan · November 30, 2011, 5:13pm

here you go. i changed the extension from bin to txt.
missingdata.logs.txt (498 KB)

Kevin.Herron · November 30, 2011, 5:20pm

It doesn’t look the missing data is due to communication timeouts with the driver, so I’d start looking at the data now.

I don’t know if you turned the loggers for that device on to TRACE on purpose or if it was done for prior troubleshooting or what, but you should really turn the levels back down to INFO now because your logs full of trace-level log messages.

Jonathan · November 30, 2011, 5:30pm

yes, i just turned the trace on this morning to see if it showed me anything useful. i have turned the trace off.

When you say look at the data do you mean in the database table? What I see on the chart is what I see in the database table. Could the transaction group not be writing to the table for some reason?

Kevin.Herron · November 30, 2011, 5:40pm

Can you send some logs in without TRACE being on that contain a time period in which some of those timeouts occurred?

Also: what are the specs of the machine this is running on? Does it happen to be something low-end like a single-core VM?

Jonathan · November 30, 2011, 5:51pm

attached is the graph, the database table data, and the log with out trace on from ~9:24 AM to ~9:42 AM today.

the server runs on a VMWare server with 1 vCPU Intel Xeon Core 2 and 2048 GB of RAM
missingdata0924-942.logs.txt (521 KB)
missingdata 0924-0942.csv (153 KB)

Kevin.Herron · November 30, 2011, 5:58pm

With only 1 core the and the regularity of the problem (~every 2 minutes) I’m almost certain this is due to the internal DB auto-backups.

You can turn off auto-backups by editing your gateway.xml file (to be found in the data/ folder in 7.3.x and… somewhere else inside the install directory in 7.2.x… will get back to you when I find it…).

Change the following lines:

<entry key="context.startup.useautobackups">true</entry>
<entry key="localdb.autobackup.count">5</entry>

to

<entry key="context.startup.useautobackups">false</entry>
<entry key="localdb.autobackup.count">0</entry>

Also just for reference, you’ll never hear us recommend running Ignition on a single-core machine. You should get the VM changed to be dual-core if possible, especially if you aren’t comfortable turning auto-backups off.

Jonathan · December 5, 2011, 5:50pm

Will increasing the CPU cores cause the licensing to go into trial?

Travis.Cox · December 5, 2011, 11:14pm

No, it is not based on the CPU.

Jonathan · December 12, 2011, 10:46pm

The problem is still happening after i made a change to the gateway.xml file. How do I know if the backups have stopped?

this is what i did:

copied the gateway.xml file to the desktop
changed the gateway.xml file on the desktop as follows:

[code]<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Context Settings 80 443 3 false 0 2 3 [/code] 3.stopped the gateway service 4. copied and replaced gateway.xml file in the data folder with the edited version from the desktop 5. started the gateway service

after a few minutes i saw the database to start missing records. in the console i see a line that believes me to think the gateway is still performing autobackups. the console says:

the console says this line about every 3 minutes.

Colby.Clegg · December 13, 2011, 6:09pm

Ah, yes, you turned off the auto backup of the primary internal database, but the data cache db is still executing as well. I’m not sure that it’s currently possible to turn that off, though it definitely should be.

However, it really shouldn’t take over a minute to execute. I suspect the cache is very large- and we’ve seen once or twice the file becoming large, even with no data in it (that is, the file isn’t being “vacuum”, or shrunk back down after forwarding the data).

Try this, in order to confirm that this is causing the problem:

Shut down the gateway
Find the data cache directory under “context/main/datacache” (7.2) or “data/datacache” (7.3). The db will be in the “FactorySQL” folder there.
Rename that “FactorySQL” folder to something else.
Restart the gateway, a new FactorySQL folder will be made.

A fresh data cache db should take only milliseconds to back up. If this helps, we will need to look at how to turn off autobackups for the data cache, or vacuum the db file.

Regards,

Jonathan · December 13, 2011, 7:16pm

yes, now the console says

Jonathan · January 6, 2012, 4:48pm

It has been 3.5 weeks since I refreshed the data cache and now the console says the backup is taking 9 seconds.

I am going to go ahead and refresh the cache again, but is there something we can do to prevent this from happening?

Thank you,

Colby.Clegg · January 6, 2012, 5:22pm

We’ve turned off auto-backups on the data cache for 7.3.3, which will definitely prevent this, but I think that the fact it’s occurring at all on your system indicates to me that a lot of data is going through the data cache. If you’re not aware of the connection to the db going down, this probably just indicates that the settings on the store and forward system are such that everything ends up passing through it. This isn’t all that uncommon, as unfortunately the default settings are a bit susceptible to this.

Try this:

First, check out the status page for the store and forward system to verify that my guess is correct. You should see that a lot of entries have gone through the cache. If the connection to the db doesn’t go down, there really should be none.
If it looks like that’s happening, and the settings for the store and forward system are default, it means that over 25 records are coming in at a time, or taking more than 5 seconds to forward, and are tripping the trigger to go to the cache. Edit the store and forward settings, and under [i]Store Settings[i], increase the write size and write time. These are either/or triggers, so something like 100 records or 60 seconds should be good.
Once you’ve applied the update, if there were any records still in the store, hopefully the system can catch up. Once that’s done, you can clear the datacache db as before.

I think the reason why the autobackups increase in time is that we don’t compact the data cache db. So, as new data is put in, it grows, and never shrinks. This can cause the file to get bigger over time. *Idea: I just had another thought, check the quarantine for the store and forward pipeline. If some data is being quarantined, the database will grow over time (up to the limit specified in the settings).

Regards,