High CPU Usage - Diagnosing

If you haven’t adjusted your ignition.conf to use G1GC instead of CMS, do that first. Seriously.
Also use the GC logging options until you have finished tuning.

We are using the G1 garbage collector.

I tried looking at the thread dumps to see if I’m able to get some more information on these gateway-shared-exec that are being blockedthread_dump (2).txt (101.2 KB)

I doubt the thread dumps will help. You have a memory leak. You should start with a class usage histogram captured with the jmap tool or equivalent. Compare that to a histogram captured before the memory consumption ramps up. Note that Ignition will stall for a few seconds while the histogram is captured in most cases.

The named query cache maintenance system got revamped (for performance) in 7.9.9. That’s the root cause of the issue.
You probably didn’t encounter the issue on your dev system because without active users you wouldn’t be building up a significant cache. Since 7.9.9 currently has an RC out, the release should be out within a couple weeks.

1 Like

So will setting my name queries to stop caching when I create them help mitigate the issue I am currently having?
Is it possible to run the clearNamedQueryCache on the script console of my designer in order to clear the cache.

I believe disabling caching should help, but I am not very familiar with the caching mechanism so I’m not entirely sure what the change involved - only that it was specifically designed to fix this kind of issue.

Unchecked the cache function under the named query has solved the high usage issue.

Guys I need Help,

Server Specs:

  • HP Proliant G9 4cores CPU, Not Hyper Threading
  • Windows Server 2008R2 Standard Vm, 4Cores CPU, 12Gb RAM

Ignition Gateway:

  • ver. 7.9.6
  • 45564 Tags
  • 4 PLC connection
  • MySQL DB
  • 6 Client Stations

Issues in the Field:

  • After a long Run of the Gateway CPU usage would go up to 100% which cause operational problems.
  • When we Reboot the Gateway it will go back to normal.
  • is there any fixed?

Using our Office Server Dell PowerEdge 24cores CPU & 128GB RAM, Support Hyper Threading:

  • We Run the Gateway Server Vm with 8cores CPU & 12GB of RAM. it Works Fine
  • But for our simulation Test.

Screen Shots Gateway Status after Reboot.

A snapshot after reboot isn’t much help. A snapshot while the problem is occurring would be helpful. But also post your ignition.conf, as requested in the other post.

ps. Please don’t post in multiple topics for the same problem. This should all be in the new topic.

I am glad this topic popped up again as we are seeing High CPU usage on our Gateway:

The CPU ranges anywhere from 15% to 99%. There are several PTW events logged:

We have a relatively large project: 7.9.9, MES-heavy (Sepasoft), 4 sites running on a single Gateway with over 180 clients, 160 queries/sec, and 143 devices. The Gateway runs on an ESX Cluster Node, with 8 other VMs that use very little resources (our VM is by far the top consumer).

I have attached a copy of our ignition.conf file and a thread dump during a CPU spike.thread_dump.txt (619.2 KB)
ignition.conf.txt (8.4 KB)

You have both CMS and G1GC set in your ignition.conf. G1GC is last, so that was chosen, but I recall reading that CMS settings can interfere with G1GC. Get rid of the wrapper.java.additional.1, .2, and .3. Add a target pause time. Consider logging garbage collector performance to determine if that is the cause of your clock drift warnings. See these topics:

1 Like

Thanks @pturmel!
Here’s what the GC section looks like now:

wrapper.java.additional.1=-XX:+UseG1GC
wrapper.java.additional.2=-XX:MaxGCPauseMillis=100
wrapper.java.additional.3=-Ddata.dir=data
wrapper.java.additional.4=-Dorg.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES=false

We have a planned release to production tomorrow and we will restart the Gateway service for these to take effect. I will keep this thread updated.

Oscar.

Consider including the -Xloggc:.... and -XX:+PrintGCDetails options.

Got it, this is what it looks like now:

wrapper.java.additional.1=-XX:+UseG1GC
wrapper.java.additional.2=-XX:MaxGCPauseMillis=100
wrapper.java.additional.3=-Ddata.dir=data
wrapper.java.additional.4=-Dorg.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES=false
wrapper.java.additional.3=-Xloggc:…/logs/javagc-%WRAPPER_TIME_YYYYMMDDHHIISS%.log
wrapper.java.additional.5=-XX:+PrintGCDetails
wrapper.java.additional.6=-XX:+PrintGCTimeStamps
wrapper.java.additional.7=-XX:+PrintGCDateStamps

The “additional” numbers must be unique. After cutting and pasting the lines, renumber the ones that aren’t commented out.

Boy am I dense today! Thanks, corrected it.
A quick update: our ESX engineers moved 6 (out of 9 total) other VMs that were on our same host to a different one, and this happened:

We will have to define some parameters for them to isolate our VMs our guarantee resources. Is there a set of best practices for running Ignition in a virtualized environment?

Thanks!
Oscar.

An update on this and seeking some more help:

We added two vCPUs to our VM yesterday and restarted the Gateway after the updates to the ignition.conf to use the new Garbage Collector exclusively.

Here’s what our ignition.conf looks like:
wrapper.java.additional.1=-XX:+UseG1GC
wrapper.java.additional.2=-XX:MaxGCPauseMillis=100
wrapper.java.additional.3=-Ddata.dir=data
wrapper.java.additional.4=-Dorg.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES=false
wrapper.java.additional.5=-Xloggc:…/logs/javagc-%WRAPPER_TIME_YYYYMMDDHHIISS%.log
wrapper.java.additional.6=-XX:+PrintGCDetails
wrapper.java.additional.7=-XX:+PrintGCTimeStamps
wrapper.java.additional.8=-XX:+PrintGCDateStamps

After adding the 2 vCPUs (6 vCPUs total), our CPU usage drop considerably:

However, our memory profile is looking a bit odd, and we had 2 crashes overnight (we had to restart the gateway).
Our Memory profile before the GC changes:

and after:

We did not get a change to grab a Thread Dump, but the logs indicate issues connecting to devices (OPCUA) and memory full events for alarms.
It also looks like this line:
wrapper.java.additional.5=-Xloggc:…/logs/javagc-%WRAPPER_TIME_YYYYMMDDHHIISS%.log
is not working as I don’t see any java logs in the logs folder? Maybe this could be causing some overhead?

Thanks,

Oscar.

So using G1GC with a pause target really smoothed out your memory profile. It’s not clear to me why you have no GC log file. There’s probably an error in the wrapper log at startup indicating the problem with that parameter. Consider using a full path to the log file.
The memory full indication suggests that something is happening that spikes memory usage and 8GB of RAM isn’t enough to handle it. Whether it is a bug causing a memory leak or a legitimate need for lots of memory (I’m looking at you, reporting!), I can’t tell. There are probably clues in your wrapper log. Can you give the VM a bunch more memory (increasing the max in ignition.conf accordingly) to see if you can ride through the spike? If so, and the profile returns to the vicinity of 6GB afterwards, then I would suggest it is a legitimate load.

1 Like

Hi,
I will try that; I’ll report back with the results of increasing the RAM.
One of the common errors we saw in the logs was around alarm journaling; we disabled it around 9:30AM this morning and that helped the load. This is likely not the culprit, but most definitely a contributor.

You are spot on on the GC log file… I am trying using backslashes for the path next (…\logs\javagc…)
INFO | jvm 1 | 2018/12/13 12:27:56 | Java HotSpot™ 64-Bit Server VM warning: Cannot open file …/logs/javagc-20181213122746.log due to No such file or directory

Oscar.