Today we found that many of our devices were either not receiving commands or getting them after very long delays (1 minute or more). After troubleshooting it's clear that scripts executed on tags are having trouble getting through. When I run the exact same script via the script console in Designer things work perfectly.
To sandbox the issue further I created a super simple memory tag as a boolean in my default tag browser. I appended a simple script to it as follows:
In other words, if I click the value to True, it should immediately revert to False. However, when I click it on, there is no change in state back to False.
The fact that this issue is happening in a memory tag tells me that this issue is not related to our network or anything outside the Ignition Server. Can someone suggest where I can look from here??
Keep in mind that currentValue
is a QualifiedValue
object, meaning that you should be comparing the .value
property of it to True
, or more simply:
# You can also do `currentValue.value == True`, but this is more succinct
if currentValue.value:
system.tag.writeBlocking([tagPath], [False])
Good catch and thanks for the instant reply. Unfortunately even appending this does not help the situation. Beyond my little test tag, the other tags affected have been in use for quite a while and are all suffering from this issue. Any other thoughts?
Also in our frustration we tried all of the different write functions: system.tag.write, system.tag.writeAsync, and system.tag.writeBlocking. All behave the same (not completing execution)
Do you have any new scripts? Things in which some inexperienced person might have put .sleep()
calls or busy loops? What you describe sounds like thread pool exhaustion caused by scripts not running to completion in very few milliseconds.
Multiple long-running query tags can have this effect too, IIRC.
Thanks for your input. I don't believe anyone has added any scripting but this is not impossible. How do I check the thread pool? One additional comment - this issue is happening immediately even after doing a net stop ignition/ net start ignition or even a full reboot of the server.
If the above isn't working, I'd be interested in seeing what your Status -> Threads page shows, specifically for gateway-tags-eventscripts-*
threads, which are the ones that service tag event change scripts. In my example below I injected a sleep to block up one of the threads:
I'll mention here that it is very imporant to avoid blocking in tag event scripts.
Next, review your logs for errors from a given tag change script.. If they error out once, they'll produce a log, but recurring errors I think only emit as DEBUG (until a successful execution resets that log-spamming protection).
EDIT: as can happen around here, Phil beat me on this reply.. and as he mentions below, yeah, just use the Running Scripts diagnostics, I keep forgetting that is there nowadays..
1 Like
Look at Status => Diagnostics => Running Scripts in your gateway.
1 Like
Quick reply: Under currently running scripts I do have a few which are running, sometimes for 10-20 seconds. They do not appear to be locking up however.
The thread pool is only size == 3, so if you have three threads taking 10-20 seconds for each event, nothing else is running.
Tag events need to always execute to completion in a few milliseconds--like, single-digit milliseconds.
Anything that might run longer should be on a dedicated thread in a project's gateway tag change event, not on a tag's own events.
Under threads I'm seeing a few gateway-tags-eventscripts:
TIMED_WAITING
is usually .sleep()
. Find those. Kill them dead. Whoever put those in killed your server.
Thanks guys. We've been running the way we have without issues for quite a while and do not understand why this is happening now. However I can't argue with what you're saying.... I'm going to start hunting for these delays. I'll report back what I find! Thanks!
It is a sharp edge when you go from two misbehaved scripts to three. Boom!
OK - I was able to trace the issue to a script living on a UDT which has 100's of instances associated with it. I need to understand what is happening here, but for now I have a question:
When I COMPLETELY disable the script, I still see it popping up on the thread. Are scripts queued up if the thread is saturated? I even verified that the UDT change did propagate to the tag which is running this script. Please advise? If there is a queue then is there a way to clear it?
Yes, a queue of five pending events for each tag. If that is overrun, the missed event flag will be turned on for the next event for that tag that gets queued.
There's no explicit way to clear the queues that I know of. Maybe restart the tag provider? Or restart the gateway again.
It turns out that IT was doing something to the server where our Ignition GW lives during the time we had the above issues. General connectivity to devices was not stable. I believe that due to this scripting was stalling since tag writes were to Modbus registers that were being brought in over our network.
After IT finished what it was doing everything went back to normal like magic.
This brings up a followup question regarding tag writes: The more recent versions of Ignition introduced the function system.tag.writeBlocking() which I have been using. Would it make more sense to switch over to system.tag.writeAsync() or the older system.tag.write() to avoid choke points in scripting where network connectivity to devices is sometimes an intermittant issue?
Maybe. There's no free lunch - these writes will end up queued and potentially executing in the background for an unknown amount of time.
This also assumes your script can proceed without needing to know that write actually happened yet.
Thank you for your input. The root issue here is network stability which is obviously something we need to improve.