We have an issue that has happened on more than one occasion, but we are unable to locate a cause.
Our tag change events occasional stop firing causing all of our lines to stop tracking OEE. We have logic in place to update the OEE Status, Infeed, and Outfeed counts whenever a tag is updated from a controller.
On a couple of occasions we have tried resetting the OPC UA module without success. In order to get tag change event scripts to fire again we end up having to restart the gateway. After restarting the gateway we have to adjust the numbers manually, causing our shift information to be incorrect.
We are unable to find anything in the logs that help us determine what this issue is. Server and Gateway performance is stable throughout the failure. This issue seemingly happens at random, and is happening with increasing frequency. On average its occurring around once every 2 weeks.
Any help would be appreciated. Have you had similar occurrences? Is there anything specific we can look for in our logs?
Thanks for the quick response! That’s a good point Kevin. We will be sure to grab this next time. We export all of the logs to Elastic, is there anything we can look for in those before it happens again?
It’s not likely that this issue shows up in your logs. My guess is that you eventually end up with a number of tag change scripts all blocked indefinitely on some network call or something. The thread dump will show us if this is the case.
The cases I recall where this was happening and then found and fixed were caused by infinite delays in tag events. Since they run in a pool, such delays can exhaust the thread pool. You won’t find anything in the logs as there’s “nothing wrong” visible outside the scripts in question.
Consider instrumenting your tag events with begin & end calls to track scripts still running. Adding an entry to a script module top level set() object upon begin and removing on end would be one technique to consider. Possibly logging the contents of the set as each event begins.
When the problem manifests, use a gateway message handler that can enumerate the contents of myRunningEvents, dumping a trace for each, perhaps. Instead of hunting through an entire gateway of threads.
If you want to get fancy, you could use a dictionary instead of a set, with the thread as the key, and information about the event as the value. Possibly including a start timestamp. That would help narrow down which tag event is the culprit.