Data logging failure after power outage

weertske · February 16, 2011, 7:26pm

Yesterday my customer had a 4 hour power outage. During this time, Ignition remained running on a UPS supported PC as did the SQL machine. However, switches lost power and and at least some of the PLCs lost power. After power was restored, Ignition was no longer logging data to some of the historical transaction groups. Fortunately, I went online a few hours later and noticed the problem. I fixed it by restarting Ignition; logging resumed properly.

What do I do about this? Is there some programming on your end that needs fixing, or is there some setting that I can adjust to force Ignition to automatically reset. This customer is in a harsh climate and regularly experiences multi-hour power outages.

running version 7.2.3

This is not the first time this has happened. It has also occurred running 7.1.8

Colby.Clegg · February 17, 2011, 12:49am

Hi,

The system certainly should have resumed on its own. Are you using the drivers in Ignition, or a third-party OPC server? If you look in the logs directory, under “{Install Dir}/contexts/main/logs”, there should be an entry from yesterday. If you zip that up and send it to support@inductiveautomation.com we might be able to get a better idea of what happened.

Regards,

Colby.Clegg · February 17, 2011, 12:55am

Sorry, I posted without really looking at your name… I know from our other threads that your using the Ignition drivers.

Anyhow, another thought: was Ignition logging, but only bad data, or was it not logging anything? Is the database on the local machine, or a remote machine? If remote, does your store and forward system status show any data in the system (under the main “Status” button in the gateway).

The logs will likely help, but the question is whether the drivers didn’t reconnect, or the store and forward system didn’t allow new data through.

Regards,

weertske · February 17, 2011, 5:02pm

My database is on another machine. Store and forward shows 2 items quarantined under the local cache. I will send the logs.

Cas · March 2, 2011, 1:28pm

I am experiencing a similar problem.
When our machine operators cycle the power to the PLCs (which happens at least once per week), Ignition will sometimes reconnect and sometimes not. The last incident was at 6:30AM yesterday, noticed at 6AM today. 24 hours no data. Ouch.

I have a scheduled Service Restart set up for 5:30AM on Sunday to clear any connection problems prior to the Monday morning shift, but any power off/power on cycle during the rest of the week can cause the problem, which could go unnoticed for hours.

In the Standard group, my OPC tags display as STALE.
I am running OPCUA and am running Ignition Platform 7.2.2.
Ignition is running on a separate server, the drivers are ControlLogix5500.

I’m hoping there is some way I can get Ignition to reconnect to the PLC every time the PLC is powered on.

Thanks,
Cas

weertske · March 3, 2011, 4:31pm

Data logging stopped again and this time I am not sure it was due to a power outage. I lost two days of data. It showed up this morning when I issued a report that had no information in it.

I sent my log from the Feb 17 event and no one got back to me. Do you want me to send the log from this event?

weertske · March 3, 2011, 5:00pm

Further info on this most recent event.

Although there was no power outage the SQL machine had a momentary (10 minute) problem where it was not accepting data. It was at this time that data logging broke. I think the log is very informative, I will send it in for review.

My store and forward engines are empty except for 2 quarantined items.

Colby.Clegg · March 3, 2011, 5:50pm

Hi,

Sorry for not getting back to you on the first log. Basically, the log doesn’t show much beyond slow plc communication, so I sent it over to the driver guys to check out, but didn’t much. This most recent log that you emailed is similar, but the interesting thing here is a mysterious shutdown at 1:46am. What time was your database event?

In an effort to reduce the number of timeout errors in the database, perhaps you should try increasing the “communication timeout” on your devices from 2 seconds up to something like 10 seconds. The logs have many instances of a request timing out, only for the response to come in 1 or 2 seconds later.

For both you and Cas above, if possible, I would highly recommend updating to the latest 7.2.4 beta. There have been a couple fixes made that directly relate to reconnecting after disconnects/power outages.

Regards,

weertske · March 3, 2011, 6:15pm

At 1:46 in the SQL machine event log is the following message: “The server {7E477741-01A6-4C06-9DAC-55F6174C08A3} did not register with DCOM within the required timeout”

I am guessing the DCOM event caused Ignition to crash. However, this was only a momentary event. The SQL machine has not been reset and after I reset the Ignition machine data logging started working again.

By the way, I couldn’t just restart Ignition to solve this problem. Ignition was completely locked up - I had to reboot the machine.

I will increase my timeout settings and try the beta.

Colby.Clegg · March 3, 2011, 7:25pm

I really don’t think a random dcom event on a different machine would affect Ignition. I can’t find a definite answer on what process that GUID refers to, but it seems to be something by Symantec.

However, the coincidence is striking that something happened on that machine, and at the exact same time (1:46am) something on the Ignition machine triggered it to shut down.

By chance do you still have the wrapper.log files that span that time? Ignition keeps 5 of them in the install directory, see if their timestamps cover that day still. They might have a better indication as to why Ignition tried to stop (or something tried to stop it).

Does the Ignition machine have any information from that time period in its system event log?

Regards,

weertske · March 3, 2011, 9:02pm

I sent the wrapper.

In answer to your question about the Ignition machine event log, there were two events, neither of which I think are related.

At 1:44 this event occurred: “[color=#800080]Reset to device, \Device\RaidPort0, was issued.[/color]” I think this is a maintenance event.
At 1:46 this event occurred: “[color=#800080]Windows failed to apply the Internet Explorer Zonemapping settings. Internet Explorer Zonemapping settings might have its own log file. Please click on the “More information” link.[/color]” This is a nuisance event that occurs regularly.

Colby.Clegg · March 3, 2011, 10:16pm

Hi,

Thanks for sending that, it is indeed more useful than the HTML log.

Um, I really don't think this is normal. I put that error into google, and a bunch of pages popped up with people complaining that after getting that message, their server was no longer usable until rebooted.

Along those lines, minutes after that happened, we see in the wrapper log:

And then the gateway shuts down. This is the "wrapper" that we use- the executable responsible for running Ignition- noticing that the system is frozen, and trying to recover. A little bit further in the log, we see that it fails 5 times to get things going, and gives up.

This is interesting, because I suppose it could indicate a problem with the RAID controller. And when RAID controllers don't work well, the computer usually doesn't work well. In fact, I'm currently using a backup machine because my primary blue screened and is now too slow to use as the raid rebuilds

Hope this helps a bit with the troubleshooting,

weertske · March 3, 2011, 10:22pm

Thank you for looking into it further. I will contact the IT group and point out that the RAID may be failing.