VisualVM cannot detect the Ignition gateway process


I'm using version 8.1.28, Ignition Gateway has been experiencing frequent crashes recently, and I'm trying to analyze the issue using VisualVM, but I can't seem to connect.

Recently we often see errors like 'Clock drift, degraded performance, or pause-the-world detected. Max allowed deviation=1000ms, actual deviation=5741ms'.


The CPU usage on the server is consistently low, and it only increases when these errors occur. Is this normal?

Thanks in advance.

Did you set the debugging parameters in ignition.conf ? There are samples in an original file that are commented out. Ignition's JVM doesn't listen for debugger connections without those settings.

Edit: Hmmm, these are what I use to enable VisualVM:

wrapper.java.additional.11=-Dcom.sun.management.jmxremote
wrapper.java.additional.12=-Dcom.sun.management.jmxremote.port=1089
wrapper.java.additional.13=-Dcom.sun.management.jmxremote.ssl=false
wrapper.java.additional.14=-Dcom.sun.management.jmxremote.authenticate=false

Also, I'd guess your hardware is somewhat underpowered, and garbage collection is making those spikes. Consider the following settings in ignition.conf:

wrapper.java.additional.A=-XX:MaxGCPauseMillis=100
wrapper.java.additional.B=-XX:+PrintGCDetails

(Where .A and .B are integers that do not conflict with any other additional parameters.)

Following your previous suggestions, we made some modifications:

  1. Modified the Ignition.conf file.
    • wrapper.java.additional.7=-XX:MaxGCPauseMillis=100
    • wrapper.java.additional.8=-XX:+PrintGCDetails
  2. Changed the gateway settings.
    • Set the CPU Usage Threshold to about 15%
    • Set CPU Usage Exceedance Duration to 5

We encountered another round of batch clock drift errors. Shortly after, the gateway crashed. During the time when the issue occurred, we kept receiving error messages like this:


The Thread Dumps file automatically exported by Ignition during the exception:
thread-dump-2024-02-28-144211-1.json (7.9 MB)
thread-dump-2024-02-28-144211-2.json (5.7 MB)
thread-dump-2024-02-28-144211-3.json (3.4 MB)
Additionally, the JVM status observed through VisualVM is shown in the image below.

The status of the VMware virtual machine running Ignition:

However, we are unsure how to analyze these logs to identify the problem. What should be our next steps after exporting the logs?
For example, in the exported file, we saw many threads named perspective-worker and Jython-Netty-Client.
What are these threads used for?
Under what circumstances would there be so many of these threads running?
Is this phenomenon normal?
How can we avoid these situations?

Today we encountered a similar issue again, with the thread count increasing to over 10,000, and the gateway crashing directly.

Ignition-BJP1-BACC-01_thread_dump20240229-121552-1.json (6.2 MB)
Ignition-BJP1-BACC-01_thread_dump20240229-121552-2.json (5.7 MB)
Ignition-BJP1-BACC-01_thread_dump20240229-121552-3.json (5.7 MB)
Ignition-BJP1-BACC-01_thread_dump20240229-121552-4.json (7.5 MB)

That's a lot of material. I suggest you open a support ticket. (Given the TxTimeouts, I suspect you have some UI element that can create extreme database loads, and the processing of such a dataset swamps your system.)

I suggest you open a support ticket.

How should I do it?

Another phenomenon is that VisualVM shows a high CPU usage, but when checking Windows Task Manager, the CPU usage never exceeds 10%. Regarding the -XX:ActiveProcessorCount=<N> parameter, what is the default value for Ignition? Is the slowdown related to this?

The issue is now seriously affecting normal production. Could you please give some advice? Thank you.

https://support.inductiveautomation.com/

3 Likes