Execution Manager stalling

jtdfan · August 17, 2023, 2:12pm

We have a runnable that we have set up using the executionManager.register that from time to time just stops processing.

Our run function does not throw any exceptions as we catch them all and log them with hopes to let it keep going.

What would cause an object on the shared ExecutionManager to stall and stop processing? Both of ours quit executing and we aren't seeing anything in the wrapper log.

Any help would be appreciated.

Kevin.Herron · August 17, 2023, 3:08pm

I can only think of 2 scenarios...

your task threw an Error, not an Exception/Throwable, such as StackOverflowError or OutOfMemoryError
all threads of the ExecutionManager's fixed size thread pool are blocked

A thread dump is a good place to start because it would at least show #2.

pturmel · August 17, 2023, 3:14pm

Error is a subclass of Throwable. Catch and log throwables in runnables, not exceptions.

Kevin.Herron · August 17, 2023, 3:38pm

Oops, yeah. And we already catch stray Throwables for tasks submitted to the ExecutionManager.

jtdfan · August 17, 2023, 3:50pm

So, to be clear, the runnable should catch any Throwable to prevent it from stopping in the execution manager. Not just catching Exceptions?

Kevin.Herron · August 17, 2023, 4:00pm

It should, but it doesn't actually have to. The ExecutionManager wraps submitted tasks and catches stray Throwables, and logs an error to the gateway logs if that happens.

jtdfan · August 17, 2023, 4:10pm

We aren't seeing the error in the log or maybe we aren't sure what error to look for. The runnable just stops processing.

Any idea what it might show up as in the wrapper log?

Kevin.Herron · August 17, 2023, 4:12pm

Something like "Task %s %s threw uncaught exception."

I'm not going to help you any more until you produce a thread dump...

jtdfan · August 17, 2023, 4:18pm

I can produce a thread dump, but I have already got the system back up and running. Production was down. So is the thread dump still helpful?

Ignition-ICOB-MOCLIFT01_thread_dump20230817-111540.json (154.2 KB)

Kevin.Herron · August 17, 2023, 4:29pm

Yikes. You really shouldn't be doing module development against a production gateway.

Maybe not as useful, but I'll look anyway. Better to get one once the thread growth and other problems are being observed.

jtdfan · August 17, 2023, 4:31pm

We weren't doing development against production. It was observed in production and we were trying to find a solution to test in our dev environment. The thing that is stumping us is that it took weeks for this to appear again and it doesn't appear at every site using this version of our module.

I grabbed wrapper log and our system's log data and didn't find anything. It is good to know that we should grab a thread dump as well if this happens again.

Thank you for your help and let me know if you find anything.

Kevin.Herron · August 17, 2023, 4:34pm

This thread dump looks fairly innocuous, but keep an eye on the 12 threads from the ExecutionManager, named gateway-shared-exec-engine-N.

This dump has 3 blocked with stack traces like this:


java.base@11.0.15/java.net.SocketInputStream.socketRead0(Native Method)
java.base@11.0.15/java.net.SocketInputStream.socketRead(Unknown Source)
java.base@11.0.15/java.net.SocketInputStream.read(Unknown Source)
java.base@11.0.15/java.net.SocketInputStream.read(Unknown Source)
app//org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
app//org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
app//org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
app//org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
app//org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
app//org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
app//org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
app//org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
app//org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
app//org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
app//org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
app//org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
app//org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
app//org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
app//org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
app//org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
app//org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
com.flexware.lift.mir.AbstractApiClient.doGet(AbstractApiClient.java:116)
com.flexware.lift.mir.robot.api.MissionQueueApi.getMissionQueue(MissionQueueApi.java:34)
com.flexware.lift.mir.gateway.MirRobotUpdateRunner.run(MirRobotUpdateRunner.java:239)
app//com.inductiveautomation.ignition.common.execution.impl.BasicExecutionEngine$TrackedTask.run(BasicExecutionEngine.java:587)
java.base@11.0.15/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
java.base@11.0.15/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
java.base@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.base@11.0.15/java.lang.Thread.run(Unknown Source)

If those are blocking indefinitely, or for a long time, and you end up with 12 blocked at once, all your other submitted tasks will be waiting in line and not executing until they unblock.