Timer script stuck running 2hours with call to httpClient.get during internet interruption

I’m trying to work out why a timer script which reaches out to a url using system.net.httpClient().get(…) has gotten stuck a few times after the site internet has dropped out briefly while the script was running.

I found the stack trace in the thread dump, this is the top half:

java.base@17.0.13/jdk.internal.misc.Unsafe.park(Native Method)
java.base@17.0.13/java.util.concurrent.locks.LockSupport.park(Unknown Source)
java.base@17.0.13/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)
java.base@17.0.13/java.util.concurrent.ForkJoinPool.unmanagedBlock(Unknown Source)
java.base@17.0.13/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)
java.base@17.0.13/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)
java.base@17.0.13/java.util.concurrent.CompletableFuture.get(Unknown Source)
platform/java.net.http@17.0.13/jdk.internal.net.http.HttpClientImpl.send(Unknown Source)
platform/java.net.http@17.0.13/jdk.internal.net.http.HttpClientFacade.send(Unknown Source)
app//com.inductiveautomation.ignition.common.script.builtin.http.JythonHttpClient.send(JythonHttpClient.java:102)
app//com.inductiveautomation.ignition.common.script.builtin.http.JythonHttpClient.get(JythonHttpClient.java:308)
jdk.internal.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
java.base@17.0.13/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
java.base@17.0.13/java.lang.reflect.Method.invoke(Unknown Source)
app//org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:190)
app//org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:208)
app//org.python.core.PyObject.__call__(PyObject.java:477)
app//org.python.core.PyObject.__call__(PyObject.java:481)
app//org.python.core.PyMethod.__call__(PyMethod.java:141)
org.python.pycode._pyx116288.getForecastData$6(<module:project.apis.nem>:283)
org.python.pycode._pyx116288.call_function(<module:project.apis.nem>
...

A couple of things:

  1. Function getForecastData, line 283 of script library project.apis.nem is supposedly the call that failed (from the stack at the bottom), but this line of code is inside of that function but doesn’t have any code that has anything to do with an http client, see below. note: extractMultiTableCSV is just doing basic list/dict manipulation. Is this coming from some compiled or otherwise different running python file version my code? how can I view this instance of it to see where the error is coming from?

  2. This is the line of code I believe the error is actually coming from:

    image
    where CLIENT is defined at the top as image
    (which i’ve replaced with image)

    note: this was originally being called with just CLIENT.get(reportUrl) and I’ve since explicitly added the timeout parameter, but I believe this defaults to 60s anyway.

I’ve tried to simulate this in a dev environment with webdev an a doGet with a time.sleep in it to delay returning. I’ve then used client.get(…) to request from it and then stopped the ignition service before the delay expires, but the get method always produces a java IOException timeout… I can’t get it to fail to produce a timeout exception and just keep waiting; it always produces the exception.

Is there something else in that stack trace that suggests some other reason and how to resolve it?

I would suspect deep packet inspection middleware faking the open TCP channel while the internet was down.

1 Like

How would I prevent that? Or at least timeout after x seconds of not getting something back?

No clue. Sounds like a bug in the underlying java. What version is this?

The timeout in the creation of the client is for connect timeout. You have to also specify a timeout parameter in the actual request to have that timeout.

Might still be an underlying bug, since that is supposed to also have a 60s default.

Consider using the async version and providing your own timeout to Promise.get(...).