G1GC Gargabe Collection - Init & max value

gbrp · July 10, 2020, 4:43pm

Hello everyone,

I have a couple of questions about the best practice configuration of G1GC on Ignition.
First of all, is best practice to set the init and max value to be identical on Ignition?
According to this link:

The VM allocating or giving back memory from the operating system memory may cause unnecessary delays. Avoid the delays by setting minimum and maximum heap sizes to the same value using the options -Xms and -Xmx , and pre-touching all memory using -XX:+AlwaysPreTouch to move this work to the VM startup phase.

I checked some other previous replies on the forum, and on this thread the recommendation is to go for this +30% on top of the peak extra memory allocation.

I wanted to understand which approach to attempt on a server returning Thread Viewer causes Clock Drift warnings.

The current config is:

wrapper.java.initmemory=8192
wrapper.java.maxmemory=32768

On a 64GB of ram server.

In addition, I wanted to confirm that the parameters passed to 'wrapper.java.initmemory' and 'wrapper.java.maxmemory' are overriding the following when lauching the JVM:
-XX:InitialHeapSize — Minimum Java heap size
-XX:MaxHeapSize — Maximum Java heap size

Thank you very much for any support on this.

All the best,

Kevin.Herron · July 10, 2020, 4:55pm

G1GC doesn't do this in JDK11, so it's not relevant.

Yes, they are the same thing.

Your config seems fine. The thread viewer doesn't perform well when there are a lot (hundreds, maybe thousands on some systems) threads. Your memory and GC settings aren't going to really do anything about this.

Kevin.Herron · July 10, 2020, 5:03pm

Hmm, actually it does in a few cases, but they really enhanced the behavior in JDK 12: JEP 346: Promptly Return Unused Committed Memory from G1

gbrp · July 10, 2020, 5:11pm

G1GC doesn’t do this in JDK11, so it’s not relevant.

We are using JDK8 at the moment. The Oracle describes JDK9, but it appears to be the case in JDK8 as well.

Yes, they are the same thing.

Thank you!

I was thinking about maybe enabled the logging on the GC to see STW events latency, the machines is kind of suffering of briefs hangs from time-to-time, which should not be normal. The server is connected to about 100 devices, but the memory usage does not really exceeds 8 GB. So the init starts at 8GB as set in the wrapper but then it shrinks.
Screenshot 2020-07-10 at 18.08.00

Kevin.Herron · July 10, 2020, 5:12pm

Yes, enabling the GC logging is a good place to start. Your memory trend looks pretty good though.

gbrp · July 10, 2020, 5:27pm

Your memory trend looks pretty good though.

It does!

Any recommendation on MaxGCPauseMillis - the default value is 20, which should be fine.

Kevin.Herron · July 10, 2020, 5:29pm

The default is 200, not 20, isn’t it?

@pturmel has had success using 100 instead, I think. I don’t typically recommend anybody start making changes unless they know what they’re doing and how to measure the difference. The defaults are usually good enough.

pturmel · July 10, 2020, 5:57pm

You definitely want to print GC details when you suspect pause-the-world issues. The details are pretty noisy, like so:

2020-07-10T13:32:09.719-0400: 941803.106: [GC pause (G1 Evacuation Pause) (young), 0.0275772 secs]
   [Parallel Time: 20.7 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 941803107.5, Avg: 941803107.5, Max: 941803107.6, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 9.5, Avg: 10.6, Max: 11.7, Diff: 2.2, Sum: 21.1]
      [Update RS (ms): Min: 1.5, Avg: 2.2, Max: 2.9, Diff: 1.4, Sum: 4.5]
         [Processed Buffers: Min: 29, Avg: 31.5, Max: 34, Diff: 5, Sum: 63]
      [Scan RS (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 7.0, Avg: 7.3, Max: 7.7, Diff: 0.7, Sum: 14.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 0.2]
      [GC Worker Total (ms): Min: 20.3, Avg: 20.3, Max: 20.4, Diff: 0.0, Sum: 40.7]
      [GC Worker End (ms): Min: 941803127.8, Avg: 941803127.9, Max: 941803127.9, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 6.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 4.7 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.6 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.5 ms]
   [Eden: 261.0M(261.0M)->0.0B(262.0M) Survivors: 6144.0K->5120.0K Heap: 334.3M(446.0M)->74058.2K(446.0M)]
 [Times: user=0.05 sys=0.00, real=0.03 secs]

Note the actual pause time right in that first line: 27.5 ms. grep for “GC Pause” to see the important stuff.

The setting for MaxGCPauseMillis is a goal, not a guarantee, fwiw.

I’ve had success with my Ethernet/IP module with that set as low as 50ms, on an otherwise lightly loaded server, but CPU usage does go up. Also make sure you have plenty of cores. You want to ensure no event waits more than a few milliseconds to run, and runs in a few milliseconds. Push anything more intensive into a dedicated thread, and ensure you have cores for the worst case with such threads.

gbrp · July 12, 2020, 12:07pm

The default is 200, not 20, isn’t it?

yeah, 200! I forgot a 0!

@pturmel

You definitely want to print GC details when you suspect pause-the-world issues.
This is my plan for next week, I will set the logging in the wrapper and then will consider solutions if that is the problem.
I will update the thread if something comes up.

Thank you very much for your replies.