I am using gateway messages (on dedicated threads) to run scripts on the gateway that will not work on clients (mostly because of network name resolution issues). Today, one of my handlers stopped responding for reasons that I don’t understand. Checking the thread diagnostics revealed this:
The startSimulation handler is the one that was not working, and it’s hanging out in the RUNNABLE state, whereas the other handlers are all WAITING.
I was able to fix the problem by disabling and then re-enabling the offending handler and then saving the project. However, if this happened once it can happen again so I’d like to understand what happened. Does anyone have any suggestions?
Runnable doesn’t mean anything by itself. Did you happen to export a thread dump, or copy the stack trace of that thread out?
Sadly no, I looked at it but didn’t have the presence of mind to copy it. I’m guessing I’ll need to wait for the problem to occur again before it can be analyzed properly?
This problem finally surfaced again and I was able to export a dump and a trace. Does this shed any light on the problem?
thread_dump.txt (93.8 KB)
I’d guess it’s this one:
Daemon Thread [gateway-messagehandler-AndritzRL-startSimulation-1] id=1356, (RUNNABLE) (native)
owns monitor: java.lang.Object@3b0b59e3
owns synchronizer: java.util.concurrent.ThreadPoolExecutor$Worker@4d806696
sun.nio.ch.SocketDispatcher.read0(Native Method)
sun.nio.ch.SocketDispatcher.read(Unknown Source)
sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
sun.nio.ch.IOUtil.read(Unknown Source)
sun.nio.ch.SocketChannelImpl.read(Unknown Source)
sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
java.lang.reflect.Method.invoke(Unknown Source)
org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:186)
org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:204)
org.python.core.PyObject.__call__(PyObject.java:404)
org.python.core.PyObject.__call__(PyObject.java:408)
org.python.core.PyMethod.__call__(PyMethod.java:124)
blocked reading from a socket that probably has an infinite read timeout set.
edit: there’s another one blocked on a socket read too:
Daemon Thread [MainThread] id=1291, (RUNNABLE) (native)
owns monitor: java.lang.Object@1d01047b
owns synchronizer: java.util.concurrent.ThreadPoolExecutor$Worker@418b8123
sun.nio.ch.SocketDispatcher.read0(Native Method)
sun.nio.ch.SocketDispatcher.read(Unknown Source)
sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
sun.nio.ch.IOUtil.read(Unknown Source)
sun.nio.ch.SocketChannelImpl.read(Unknown Source)
sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
java.lang.reflect.Method.invoke(Unknown Source)
1 Like
Thank you! That handler is using xmlrpclib, which is probably the culprit. Specifying a timeout should be easy enough.
Hopefully understanding what caused the issue makes it easier to replicate for testing the fix.