Clock drift, degraded performance, or pause-the-world detected. Need Help!

chad_spangler · November 13, 2024, 2:46pm

So I am not sure why this is happening But ignition is maxing out the Ram that is has. We have tried several things including increasing RAM on the VM, Increasing processors, making the init ram equal to the max ram on the .conf file and doing this change
wrapper.java.additional.2=-XX:MaxGCPauseMillis=100.
None of it has seemed to much of anything.
What happens on a server restart is the ram and cpu will start at a reasonable number then it continues to build and drop but not by enough then it loops this until the RAM is maxed and we get the clock drift error. I thought it was the garbage collector not working correctly but maybe it is a memory leak and in which case I have no clue how to find it.
Thanks for the help.

lrose · November 13, 2024, 3:29pm

This would be my bet, and they aren't easy to hunt down at all. Generally look for things that are creating background threads or utilizing persistence to try and maintain values across gateway start ups.

That being said, how much ram is available on the Gateway, and is there any other software besides Ignition running on this machine?

chad_spangler · November 13, 2024, 3:34pm

The gateway was just upped from 8-12 and there is 24 on the VM. As for what else is on the machine it isn't anything else really, just normal windows operating stuff ignition is using 13.5Gb and the next closest is a antimalware that is 250MB.

pturmel · November 13, 2024, 3:42pm

Unless you are highly familiar with java debugging tools, you probably want IA support to help you nail down where you are using/losing memory.

chad_spangler · November 13, 2024, 3:44pm

I am not I guess I will give them a call. Thanks for the quick responses guys.

michael.flagler · November 13, 2024, 3:49pm

You might check your logs and see if you're getting any other errors also. I had a similar issue a week ago and realized my database was having issues. (I don't recommend using RocksDB extension for MariaDB - it was causing issues with locking tables, so I switched to Timescale on Postgres).

chad_spangler · November 13, 2024, 3:58pm

I am getting other errors I guess I assumed they coincided with the RAM being maxed out. They are ABControlLogixTransportPool, ABControlLogixReadRequest and TimeoutDaemon. These all happen the same time the ClockDriftDetector error happens.

pturmel · November 13, 2024, 4:08pm

It is normal for a GC stall that is long enough to delay network responses for those drivers. Those are caused by the stall, not the cause of the stall.

chad_spangler · November 13, 2024, 5:41pm

I took a lunch and came back and the memory trend looks good. I didnt do anything except I realized the "wrapper.java.additional.2=-XX:MaxGCPauseMillis=100." was commented out of the .conf file. It is my understanding that memory leaks do not resolve themselves. After that change and restart it was still acting up but after 45min it seems to have resolved itself. Is this normal for a change to take 45 min to take effect or is this likely to be a problem again in the near future?

pturmel · November 13, 2024, 6:52pm

Without examining a heap dump captured during the problem, there's no good way to tell.

chad_spangler · November 13, 2024, 6:56pm

Ok nothing I can do now then thanks again for the help.