Clock Drift Pause the World Detected

jespinmartin1 · September 3, 2021, 2:56am

We have this decent server with J-Heap at 20GB. How ever, I just saw a pause the world detected, I though this could only be possible when the memory almost hit the heap, but in the time range where the event happened the memory is far from 20GB. In fact, the avg pick is always 10GB.
Question
What else can produce this clock drift?

Note: Not using Mozilla

JordanCClark · September 3, 2021, 10:51am

@dave.fogle had put up an article on this.

321liftoff · September 3, 2021, 11:52am

I have found that these messages don’t appear in the log until I start opening the performance pages, such as the threads diagnostics, and overall CPU/ram usage.

I didn’t see this explantation in the article, so don’t know if that’s expected it not.

pturmel · September 3, 2021, 11:54am

Your actual delays are just over the limit. Suggesting the JVM’s pause target is one second. You should set that much lower. I typically set a 100 or even 50 millisecond target.

Share your ignition.conf file…

pturmel · September 3, 2021, 11:54am

This one. Yes, expected. Avoid looking at threads, as collecting the details on them disrupts the JVM itself.

bschroeder · September 3, 2021, 3:05pm

Could you expound on this a bit?

pturmel · September 3, 2021, 3:08pm

The JVM can’t capture thread details while they are running. So there’s a brief pause when collection information about all threads for a thread dump. For a busy system with lots of threads, this is noticeable.

Kevin.Herron · September 3, 2021, 3:09pm

It seems a little suspicious that the events are frequently 10s apart or a multiple of 10s. Is this really 7.9? Is there a 10s gateway timer script or anything like that?

bschroeder · September 3, 2021, 3:13pm

Ok… And dropping the pause length will help as well? How do you tune that out properly?

pturmel · September 3, 2021, 3:13pm

You should have a line in ignition.conf like so:

wrapper.java.additional.2=-XX:MaxGCPauseMillis=100

The G1GC algorithm will run more often when pauses exceed that threshold, and run less often when pauses are substantially less. It is a heuristic, not a hard rule, but G1GC is pretty good at keeping pauses under the target maximum.

bschroeder · September 3, 2021, 3:13pm

Awesome. Always love learning better ways to tune these systems.

pturmel · September 3, 2021, 3:14pm

Isn’t 10 seconds the refresh pace on gateway status pages?

Kevin.Herron · September 3, 2021, 3:23pm

Not sure. I think the thread viewer page is 5s. Don’t know about others, but that’s the heaviest one I think.

jespinmartin1 · September 10, 2021, 9:42pm

Sorry for delay.
Thanks for all info there.
Indeed, as some one mentioned. This happened due to checking threads page as shows: