We have an Ignition Gateway that seems to die a slow death over the course of a couple of hours. It then restarts and the process begins again. The symptoms that we see are high CPU utilization on the Gateway Status page. We see threads accumulating on the threads page with a state of Blocked. We’ve deployed this application to a number of sites and this is the first that has experienced this problem. This is a Perspective application running on Ignition v8.1.3 and incorporating SepaSoft OEE/Performance. Is there a way to determine what these threads are and what resource they seemed to be blocked by?
The definitive answer is to use Flight Recorder or the JDK's debugging tools to examine the problem threads. Not particularly simple to execute. Support can probably help with those.
In the meantime, these topics might help you trim the list of possible culprits:
In addition to Phil’s suggestion an easy place to start would be to download a thread dump from the threads page once the blocked threads begin accumulating and post it here for us to look at. Sometimes blocked threads are expected or otherwise okay, sometimes they aren’t.
Kevin I thought I better check back in on this topic. The problem we were seeing just disappeared and we cannot explain what started the situation or what ended it at this point.
We just had a problem with our mqtt having over 100 blocked threads at one time with CPU and memory maxing out.
Nothing fixed it until we did a fresh install of ignition, tried to go back to the nightly backup first and it still was maxing out. One of our guys said sometimes the ignition install can become corrupted, he had to do this before. I’ve personally never had to resort to reinstalling ignition, but I figured to share this here.
Please call either IA or Cirrus Link support next time. Reinstalling Ignition in response to a problem like this is nonsense. A thread dump, logs, and/or a live look at the system would help.
Ignition-AC-IGN-PGW01_thread_dump20220331-123728.json (1.0 MB)
Ignition-AC-IGN-PGW01_thread_dump20220331-120458.json (1.3 MB)
Ignition-AC-IGN-PGW01_thread_dump20220331-120357.json (999.8 KB)
I’m going to leave my opinion on my co-workers troubleshooting approach out of it, I wasn’t involved, but I did log in and grab several thread dumps, and observed over 100 MQTT blocked threads, and I downloaded three of them. The threads would remain blocked only for a few moments, clear out, and then shorty after get blocked again.
Mind taking a peek and providing any insight to what may have been the problem, maybe the stacktract can give you any clues?
I think we need @wes0johnson from Cirrus Link to take a look as well, but it looks to me like one edge node being rebrowsed after a reconnect is blocking the rest of MQTT engine from processing incoming message payloads.
At first glance it looks like lots of Edge Nodes are not able to stay connected. Specifically, lots of rebirth requests are being sent out. This can happen for a number of reasons. I’d recommend taking a look at this: I have data that is toggling between stale and healthy at my subscribing MQTT Client - MQTT Modules for Ignition 8.x - Confluence. It might be best to send a note to support@cirrus-link.com if this is still and issue. If you do open a ticket - make sure to note your MQTT Engine version and Transmission/client versions as well.