Gateway Performance Issues and Tuning

I've got a gateway that our company is maintaining for a customer, and we are trying to track down a performance issue and I need some advice on where to go.

This is a gateway that is a perspective and webdev system. This gateway is a central gateway in the customer's data center that is communicating to multiple remote smaller sites via the gateway network. Some data is being synced across the gateway network using gateway timer scripts.

The issue is that every so often the CPU usage in Ignition will spike, memory will spike and ClockDriftDetector warnings will occur. Usually, the GC will be able to drop that memory down, however, in extreme situations, the java heap seems to fill up and then things start to crash.

I'm trying to isolate the cause, or more specifically prove where it isn't.

For instance, this is a cut from the performance page:

This gateway is installed on Windows Server 2016, in a Hyper-V environment. I don't have access to the Hyper-V Host side of things so I can't verify that end. I've been told that all the resources are properly allocated and it isn't overcommitted.

The only software on the guest is Ignition, Windows and Carbon Black Cloud Sensor Anti-virus.

Currently, the java heap is set for 8GB.

Where do I start looking into things to try and start to get this working better? Is this a simple memory issue? Or possibly something else?

Thanks in advance...

1 Like

Bit of an odd memory usage pattern. Typical is a sawtooth. A healthy sawtooth tends to ramp from 40%-ish (or less) to 80%-ish, then GC knocks it back down.

Except. Long time spans for requested graphs, especially in the reporting module, randomly spike memory. A couple or more such requests together could push a server over the edge. Perspective also loads a server much more than Vision, so a bunch of busy clients can do similar.

Consider this similar topic:

Maybe also set up GC detail logging in ignition.conf. See how long your pauses usually are.

Yeah... that is something I recognized as well on the memory.

Specifically, on this gateway there are not connected devices etc... This is used as a clearing house for remote data monitoring and then handling specific truck ticketing located at remote sites.

Do you have any recommended sites so I can lear how to setup Java Flight Recorder? And any specific flags to add to the ignition.conf to get the GC detail logging?

I happen to be using it right now on one of my VMs for driver testing, with this in ignition.conf.

wrapper.java.additional.9=-XX:+FlightRecorder 
wrapper.java.additional.10=-XX:StartFlightRecording=maxsize=2g,filename=porthos81z.jfr

2GB is probably too big. (:

And you probably want the dumponexit option.

Some docs here. Note that the UnlockCommercialFeatures clause went away at some point.

Follow the links for the jcmd docs for things you can do while running, including getting early exports from JFR.

For the GC reporting, I use this:

wrapper.java.additional.3=-Xloggc:/var/log/ignition/javagc-%WRAPPER_TIME_YYYYMMDDHHIISS%.log
wrapper.java.additional.6=-XX:+PrintGCDetails

Adjust that file name pattern to suit your needs. As shown, you'll need to clean them out manually.

Thanks!
I've got a local gateway in our office I can play with to try and learn how to decipher the data.

Do you think that maybe there just isn't enough memory assigned? I can add quite a bit more to the gateway., I just don't want to go that route and just cover something else up.

To look at the files you get, you will want "JDK Mission Control" and/or "VisualVM". I like the latter for looking at HPROF dumps.

It's possible. I'd look at the GC details first. And if you get another crash after JFR is set up, look at that, too.

Thanks. I also reached out to support to see if they have any ideas as well.

Any good tools for examining the GC logs?

The GC dumps are the HPROF "heap profile" files I mention above. The GC logs are just text files. I use grep to pick out the pauses.

Thanks.

Anything in particular I'm looking for the heap dumps? I finally got my local VM working correctly and VisualVM is working. I can dump the heaps and take a look locally. This isn't the same as the problem child, but at least I can get a bit of familiarity with what I'm seeing. And I can get familiar with the process as well.

When there's a memory leak, something usually sticks out like a sore thumb. I'm usually focused on my own stuff, looking at allocations by class.