Tag Change Event Scripts not Firing

robert.engwer · January 29, 2021, 6:29pm

We have an issue that has happened on more than one occasion, but we are unable to locate a cause.

Our tag change events occasional stop firing causing all of our lines to stop tracking OEE. We have logic in place to update the OEE Status, Infeed, and Outfeed counts whenever a tag is updated from a controller.

On a couple of occasions we have tried resetting the OPC UA module without success. In order to get tag change event scripts to fire again we end up having to restart the gateway. After restarting the gateway we have to adjust the numbers manually, causing our shift information to be incorrect.

We are unable to find anything in the logs that help us determine what this issue is. Server and Gateway performance is stable throughout the failure. This issue seemingly happens at random, and is happening with increasing frequency. On average its occurring around once every 2 weeks.

Any help would be appreciated. Have you had similar occurrences? Is there anything specific we can look for in our logs?

Ignition: 7.9.16
Sepasoft: 1.9.5

Kevin.Herron · January 29, 2021, 6:36pm

Next time this happens export/save a thread dump from the diagnostics section in the gateway.

Ideally, you could call support while this is happening and let them take a look as well.

robert.engwer · January 29, 2021, 6:42pm

Thanks for the quick response! That’s a good point Kevin. We will be sure to grab this next time. We export all of the logs to Elastic, is there anything we can look for in those before it happens again?

Kevin.Herron · January 29, 2021, 6:44pm

It’s not likely that this issue shows up in your logs. My guess is that you eventually end up with a number of tag change scripts all blocked indefinitely on some network call or something. The thread dump will show us if this is the case.

pturmel · January 29, 2021, 6:46pm

The cases I recall where this was happening and then found and fixed were caused by infinite delays in tag events. Since they run in a pool, such delays can exhaust the thread pool. You won’t find anything in the logs as there’s “nothing wrong” visible outside the scripts in question.

Consider instrumenting your tag events with begin & end calls to track scripts still running. Adding an entry to a script module top level set() object upon begin and removing on end would be one technique to consider. Possibly logging the contents of the set as each event begins.

robert.engwer · January 29, 2021, 6:54pm

Thanks for the help! What would I do in the set()/remove() methods? Increment/Decrement a tag or DB record by 1 to track the number of executing script module threads that are running?

pturmel · January 29, 2021, 7:07pm

No, not methods. set() is a built-in python function and data type. Your global script would look something like this, as "shared.eventMonitor":

from java.lang import Thread
myRunningEvents = set()

def eventBegin():
	t = Thread.currentThread()
	myRunningEvents.add(t)

def eventEnd():
	t = Thread.currentThread()
	myRunningEvents.remove(t)

Then your events would be like this:

shared.eventMonitor.eventBegin()
try:
	doOriginalEvent(event...)
finally:
	shared.eventMonitor.eventEnd()

pturmel · January 29, 2021, 7:09pm

When the problem manifests, use a gateway message handler that can enumerate the contents of myRunningEvents, dumping a trace for each, perhaps. Instead of hunting through an entire gateway of threads.

pturmel · January 29, 2021, 7:10pm

If you want to get fancy, you could use a dictionary instead of a set, with the thread as the key, and information about the event as the value. Possibly including a start timestamp. That would help narrow down which tag event is the culprit.

robert.engwer · February 1, 2021, 2:10pm

Thanks for the break down! I will take this to the team this morning.