When 'Clock Drift' occurs continuously, which part should be checked?

UnSuk_Song · November 8, 2023, 2:20am

My Environment

DELL PowerEdge, Intel Xeon series
RAM: 64GB
Ignition: 8.1.26 (b2023032308)
75,000 tags
Ignition's ignition.conf
- wrapper.java.initmemory=4096 (4GB)
- wrapper.java.maxmemory=32768 (32GB)

Ignition is in operation with the above environment and settings.
About a month ago, a tooth was barely being drawn near the Max Memory value of 32GB, leaving about 2-3GB behind.
In addition, there were fewer than five active client sessions at the time, and the memory allocated to each client was also less than 1GB (about 150MB per session) in total.
Therefore, at the time, it was determined that there was a memory leak and accumulated, so the cause was not identified, and because it was the timing to restart the Gateway (the service could be interrupted), it was restarted in a hurry.

Since then, yesterday, we have confirmed that the Max Memory has been filled up at once again, and this time, we have taken some measures through forums and googling.

Set the values of initmemory and maxmemory the same.
Add GC-related parameter: wrapper.java.additional.3=-XX:MaxGCPauseMillis=100
Restart the gateway(2023-11-07 18:40).

After going to work today, I looked at Gateway and found that there was no trend of accumulating memory, but a 'time drift' occurred.

(No Perspective and Designer session)

At this point, I wonder:

Why the time drift occurred even though the memory did not reach its limit?
What parts or methods should be looked at to determine the cause of the issue?
(although a little off topic) Is it possible to tune or monitor GC?

I know the content needed to infer the answer to this topic is different for each user or use case, but at the moment I'm not sure how to start which part.
If you can give advice or know a solution, please let me know.
Thank you.

andrews · November 8, 2023, 3:53am

Check out this
https://support.inductiveautomation.com/hc/en-us/articles/360047576511-ClockDriftDetector-warnings-in-the-Gateway-Console

nminchin · November 8, 2023, 4:33am

75k tags is fairly small, especially when you've assigned max 32GB RAM - you didn't mention your cores - but it sounds like something project related rather than under-resourcing.

I would be auditing your scripts checking for efficiency and any that run for excessively long (time them using java.lang.System.nanoTime() and report via loggers if they run longer than 100ms):

tag change scripts
gateway event scripts
perspective scripts

Check for scripts using time.sleep as well.

Also check for multiple calls to system.alarm.queryStatus

If there are lots and lots of scripts running all the time, then this could probably cause this.

pturmel · November 8, 2023, 1:28pm

If your clock drift corresponds to the GC pause-the-world event for the sawtooth drop-offs, then the amount of garbage being cleaned up in each is too much for your CPUs to do quickly.

Please add GC logging to your ignition.conf and post the results for a few sawtooth cycles. Something like this:

wrapper.java.additional.20=-Xloggc:logs/javagc-%WRAPPER_TIME_YYYYMMDDHHIISS%.log
wrapper.java.additional.21=-XX:+PrintGCDetails

Make sure the 20 and 21 are unique integers in the list of additional java arguments.