Perspective Threads

We had a change go into one of our Perspective applications that was doing some circular binding references and driving up the CPU utilization significantly. Unfortunately, it was like this for two weeks (and dozens of users each day) before we figured out what was happening, so we had a lot of perspective-worker and perspective-queue threads accumulating as a result.

We’ve identified and remedied the issue in the application, and when we reproduced this in a different environment, the stuck threads and high CPU resolved instantly after fixing it. But in this environment where they’ve been accumulating for a longer period of time, I’m wondering if it’s expected for the CPU/thread issues to take a bit of time to resolve.

I have already terminated any stale Perspective sessions and Designer sessions that were up since before the fix was made. The status of the threads themselves are changing (I watched a couple go from BLOCKED to TIMED_WAITING), I just want to make sure things are changing in the right direction since the CPU utilization is still very high. Is there anything else we could be doing to help these threads along now that the root cause has been identified?

Does your ignition.conf have any parameters related to Perspective threading in it?

The easy answer is to restart the gateway, but I imagine you're asking because you don't want to/can't do that.

TIMED_WAITING is almost always some form of .sleep(). Perspective has more tolerance for sleeping in scripts than Vision, but not a lot more tolerance. Perhaps you should find where that is happening. (A thread dump can help.)

1 Like

This environment of ours is managed by a 3rd party cloud provider so I don’t readily have access to that .conf file without reaching out to them, which I can do if this continues without improvement.

However I had thought that Perspective threads maintained even after a gateway restart. It would be a hassle to coordinate a restart but not out of the realm of possibility if that will help.

I have made sure we are not doing anything with sleep() in the application. Here is the thread message for one of the perspective-queue threads that is in TIMED_WAITING

Thread [perspective-queue-15697] id=342591, (TIMED_WAITING for java.util.concurrent.SynchronousQueue$TransferStack@3158707b)
java.base@17.0.13/jdk.internal.misc.Unsafe.park(Native Method)
java.base@17.0.13/java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
java.base@17.0.13/java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
java.base@17.0.13/java.util.concurrent.SynchronousQueue.poll(Unknown Source)
java.base@17.0.13/java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
java.base@17.0.13/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base@17.0.13/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.base@17.0.13/java.lang.Thread.run(Unknown Source)

And here is a perspective-worker that is in TIMED_WAITING

Thread [perspective-worker-40811] id=342328, (TIMED_WAITING for java.util.concurrent.SynchronousQueue$TransferStack@5933f5d4)
java.base@17.0.13/jdk.internal.misc.Unsafe.park(Native Method)
java.base@17.0.13/java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
java.base@17.0.13/java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
java.base@17.0.13/java.util.concurrent.SynchronousQueue.poll(Unknown Source)
java.base@17.0.13/java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
java.base@17.0.13/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base@17.0.13/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
com.inductiveautomation.perspective.gateway.threading.BlockingWork$BlockingWorkRunnable.run(BlockingWork.java:58)
java.base@17.0.13/java.lang.Thread.run(Unknown Source)

Ok. That looks like what you'd get if you have a busy spike in Perspective that causes it to expand its thread pool, followed by a lull leaving a bunch of idle threads. If so, I would think it harmless. (But a thread dump during a busy spike might give more useful info.)

1 Like

Ok, thanks. For what it’s worth, the number of Perspective threads have gone down from about 200 to 100 in a little over an hour, so I may just need to be patient as things seem to be moving in the right direction.

If you can take a gateway backup you can look inside that for a high probability guess of what's set.
Note that changing these parameters is fairly unusual, so the most likely scenario is that they're not populated at all.
Also an unlikely possibility, but worth mentioning - if your cloud provider is using containerization (likely) they could be overriding any arbitrary system properties at runtime. You would have to interrogate these system properties inside the actual gateway to determine this, e.g. with system.util.getProperty. I wouldn't bother.

JVM threads are destroyed on process shutdown and not persisted in any way. In practice, especially on a live system, your post-shutdown steady state will likely look almost the same as the pre-shutdown steady state, but they are not "the same" threads.

Again, I wouldn't necessarily bother, because...

This is not a number to be worried about. Your thread IDs, in the 340,000 range, suggest something (likely the issue you already identified and resolved) ended up creating hundreds of thousands of threads.
Perspective's default thread pool management, outlined in the manual page I linked above, creates as many threads as needed, recycling them as needed, with a sixty second idle expiration at the end. That, as Phil suggested, leads to "bursts" of work creating lots of threads (because there's nothing idle to recycle, so new threads have to be created).

If it's really a concern, or you just want to avoid the potentially spikey behavior, you can instead set an explicit size capacity for the worker threads. At that point the 'time to live' is ignored and you're always going to have N threads in the specified pool - no more, no less.

1 Like