Since I’m asking this question is because, we are running our gateway in a VM machine on a hosted windows server, and when they do patching they do a snapshot of the VM.
We also have PLC’s that communicate to our gateway over OPC, and when the patching is ongoing our PLC’s will attempt to publish data to our server, but it finds out the connection is lost and after 30s it will delete the old session. When the Server then is finished with the patch it thinks that it still has a valid OPC connection and the old session ID. But the PLC’s wont respond to the old connection and then our server thinks it has lost the connection and then reestablishes a new connection. But after this some tags that are updated often has a tag quality of good, but tags that updates rarely will have a quality of “uncertain_unknownLastValue” after it has connected again.
One fix we did was to just manually restart all the tags and then it was all good again, but this is not a valid approach, so therefore i’m asking if it could be possible to read the event logs on the windows system, so when a snapshot is taken the OPC connection is disabled and enabled again when the patch is over.
Otherwise I have another idea to do a little script that would save the current time every n seconds then compare the current with the old one and when a patch has been done a new time would be read, if it is over a certain time it would restart all OPC connections. This would have to be updated on a fixed timer, but it would be nicer if we could do it on events instead.
I have read other posts about this exact problem, but there wasn’t any who knew what caused it and the fix was never found, cause they were guided to contact the support, which I also have ongoing, but since I work in a different time zone where answers from the support can take hours then I thought I would ask the smart people of Inductive forums also
I would make a expression tag that compares the current time stamp vs the timestamp of the the Connection status and if the connection status is False. Something like
secondsBetween(now(10000), {[System]Gateway/OPC/Connections/Ignition OPC UA Server/Connected.Timestamp}) > 30 && !{[System]Gateway/OPC/Connections/Ignition OPC UA Server/Connected}
This would see - is the current time 30 seconds more than the last time the OPC UA Connectected status value was changed and the value of the status is False aka it’s been 30 seconds since connection status was turned to False.
Then do a gateway tag change event on this expression tag to do whatever logic you need.
If you don’t want to do the expression / gateway tag change event it’s more or less eqiuvalent to do a gateway timer script to look at these tags every X seconds and make your decision on what to do.
I would definitely not bother going down the road of trying to figure out windows event logs from within ignition gateway.
In the mean time i have been looking into using webdev, where i can send a post to the server that calls a message handler to restart OPC connections. I’m doing this cause since VM ware where it is running inside can use pre/post snapshot events to fire a powershell script that can send a post request to the server.
Im still in the process of testing to see if it works as i intend. If it does i have an event driven solution
Yea if you have webdev and can accept a post, then the rest of this should work fine. You can use ProcessBuilder to run your powershell script when a post is received. Then I’d tell IT to make this part of their patching runbook - do their patches, and then run some script that does the POST request to your webdev that resets the OPC connections.
The thing is production is 24/7 for our project.
We are paying a 3rd party to host a windows server, where we run our gateway in.
They do patching every 3rd Saturday at midnight, which is perfectly fine. The only bummer is when the backup is finished then our OPC connection is lost for a short period and then reconnects. We have setup a syslog server to monitor the PLC’s, where we could see a logout was never sent, and a login is sent again.
I think we have found a solution, since we can get our server hosters to run some windows powershell scripts to disable the OPC connections before a backup and after it will sent a new script to enable the OPC connections.
Anyways the problem isnt that catastrophic, if it works it is nice else i guess we can live with it as it is.
You might consider redundant servers then and instead of just automatically upgrading servers, it needs to be monitored so you can failover to the server that's not being patched. As @pturmel mentioned, there needs to be maintenance windows and a procedure put in place to patch them in a coordinated manner. This is OT, not IT and while similar, there's very distinct differences. A server being patched isn't just someone not being able to use an office app for a bit, it's production and downtime = $$ lost, not just wasting someone's time waiting for a server to come back online.