I had an issue that I wanted to put out to the community to see if others have seen it and how they handled it.
Running Ignition 7.1.5 on a Win2003 Server in a VMWare VSphere setup.
MS SQL 2005 running on hardware in Windows 2008 Server 64bit.
I have several projects running. 102 transaction groups (all but 20 of them as Historical Groups). a total of 524 OPC tags. All of these groups are either logging on a trigger (maybe once every 15 mins), or on 10, 30 second or 1 min triggers. I have a group of alarms that are only using alerts. In reality, there isn’t much going on here in terms of pounding anything.
I had an issue with my SQL log file filling up and running out of space the other day. This stopped allowing anything to be logged to the DB. This was my fault as the DB was set to Full recovery mode rather than Simple. This has been fixed.
But
For the duration that the database was down and unavailable (approximately 12 hours), the store and forward system cached and then quarantined all the data being logged. Once the database came back online, it was only partially logging anything to the database. Most unfortunately, I was out of town and couldn’t do anything about it. My IT guy changed the recovery mode to get it back up.
The system ran this way for 2 days. When I was finally able to get online with the system 2 days later, I found over 80K of records in the Quarantine.
I went into the system and told everything to retry. It was queued up as pending transactions in the local cache. This was 6pm on Wed. At 5am Thursday, there was still 80K of records pending there. Also, my server running ignition was bogged down to the point where I almost couldn’t look at task manager to see what was going on. Eventually, I found that the server was running between 60 & 90% of the CPU and java.exe memory had 1.2GB allocated.
At that point, I went into ignition and took all my transactions offline as I absolutely had to get the pending transaction in the database. At that point, it started inserting them at a rate of about 10K per hour and it sped up so that by 10am, the local cache was empty.
As a last item, java.exe memory usage was still at 1.2gb and it took stopping and restarting the gateway to release it.
This doesn’t make me feel real comfortable, to say the least.
My 1st question is why did 80K valid transactions get dumped into quarantine? I would expect that if it were indeed bad information, it would have just gotten dumped back into quarantine. It didn’t. I would also expect that this data would have been held in the local cache until the database came back online and then automatically would have been inserted into the database. Again, it didn’t.
2nd, why did emptying the local cache suck up as much memory and cpu usage as it did?
All of my store and forward settings were at defaults except for the max records which I set to 500K a few weeks ago.
Any thoughts would be appreciated as this is just a small part of the data collection system and it will be expanded as time goes on.
Thanks…