Clock Drift from PLC Timeout?

jacob_geers · May 6, 2025, 6:34pm

Hello,

Looking for input on how to troubleshoot clock drift. We get several in a row occasionally but the CPU never spikes - is this a concern?

Based on the logs I notice we also get ABMicroLogixReadRequest warnings - could this be the issue? How do I go about troubleshooting? The two database issue are embarrasing but I don't think they are related.

Example #1

Example #2

jacob_geers · May 6, 2025, 6:48pm

Seems to always be these devices.

Kevin.Herron · May 6, 2025, 6:57pm

It's most likely the other way around - these timeouts are caused by the clock drift.

jacob_geers · May 6, 2025, 8:20pm

Is clock drift without high CPU a concern?

pturmel · May 6, 2025, 8:25pm

Yes, it can screw up history and typically indicates either untuned Java garbage collection or, in Windows, actual clock warps.

Consider adding the following to your ignition.conf file (with unique sequence numbers):

wrapper.java.additional.10=-XX:+UseG1GC
wrapper.java.additional.11=-XX:MaxGCPauseMillis=25
wrapper.java.additional.12=-Xlog:gc:file=logs/gc.log:t,tags:filecount=5,filesize=4M
wrapper.java.additional.13=-XX:-OmitStackTraceInFastThrow

Restart your gateway and then watch the contents of the gc.log files.

If you are getting actual "pause-the-world" halts of > ~100ms, you should look for memory usage problems.

jacob_geers · May 7, 2025, 3:04pm

Thanks, will try over the weekend. Here is something that may be interesting, gateway-shared-exec-engine taking 20% of CPU when the clock drifts occur?

Kevin.Herron · May 7, 2025, 3:10pm

Might be interesting, you would need to stack trace from that thread (click the "+" button to expand) to know any more. It will probably be tricky to get a snapshot of that thread exactly when the CPU usage is high and the clock drift is happening.

Clancy_Cavanaugh · May 7, 2025, 3:22pm

Are you on a 3 year old minor version by chance? I remember something similar occurring on our system a few years ago.

Our issue was caused by PLCs being powered off for maintenance, and Ignition appeared to be tying up threads waiting for the timeout. I believe it was fixed in version 8.1.15 though.

jacob_geers · May 7, 2025, 3:56pm

Here is a thread dump with one item using 17% during the time of clock drifts, but it is different than before.
site-dev_thread_dump20250507-115452.json (531.9 KB)

jacob_geers · May 7, 2025, 3:57pm

We are on 8.1.32.

jacob_geers · May 8, 2025, 1:34pm

Thanks, we ended up applying those gateway config changes.

12:07 5/28 - apply config change and restart service
... no clock drifts
00:24 5/29 - first clock drift
... a few clock drifts
5:51 5/29 - tons of clock drifts all hour

Attached is the gc.log file if you wouldn't mind taking a look? My team and I are now playing around with Visual VM but don't have much experience with it.
gc.log (1.5 MB)

pturmel · May 8, 2025, 1:48pm

You do not have a GC problem. There's no pause greater than 200ms in the entire log.

Which means you have a workload problem. You will need to study regular thread dumps and extra thread dumps when the problems start. (This can be quite time consuming--you probably want IA support involved.)

In my experience, this is often caused by mishandling of long-lived asynchronous threads.

Kevin.Herron · May 8, 2025, 1:49pm

Is your Ignition Gateway running on real hardware or in a VM?

jacob_geers · May 8, 2025, 2:02pm

VMware ESXi 8.0.3.

I will monitor threads on my own for awhile then make a support request. Thank you both very much!

pturmel · May 8, 2025, 2:04pm

Ewww!

Have you followed all of the IA guidance on setting low latency mode and avoiding overcommitted resources? (VMware is poorly configured by default for latency-sensitive applications like Ignition.)

Kevin.Herron · May 8, 2025, 2:05pm

https://support.inductiveautomation.com/hc/en-us/articles/12888090388237-Considerations-and-Best-Practices-for-VMs-Hosting-Ignition