I’m looking to reduce currently relatively high average CPU usage of an Ignition project (~58% with Ignition using ~45-50%) however I’m not sure the best place to start looking.
Are there any guidelines to designing an efficient project?
Is there a particular log file that I can look at? I have found the Threads diagnostics log contains the threads and their CPU usage, however none of the names mean much to me. The majority of the high CPU usage items, adding ~20% load, are named ‘webserver-xxxxxx’. What are these threads used for and can I reduce their usage at all?
The webserver is just the gateway webpage. Sadly, the performance reporting is quite heavy on the web server, so you’ll always see it there when you have that page open. But if you close the page, it shouldn’t use any noticeable amount of CPU anymore.
Instead, most of the CPU is wasted in the “unclassified” part. Which seems like it’s not a CPU problem, but rather the system waiting on something else, like disk reads for swapped RAM.
I’m going to double up on this thread. I’m having the same issue, but inconsistently. In my case though the high CPU usage ends up crashing the gateway because it takes up 98+% of gateway CPU. I have to manually terminate every open client and then reset the gateway to get the issue to go away, then CPU usage goes back to 10-20%.
I’m having a heck of a time figuring out what is causing the problem. Any advice for how to track this down?
This has happened maybe 3 times in the last 3-4 weeks. When I look at the thread diagnostics the top 5 threads (using between 12-17% CPU each are called webserver-xxxxxxx, as nminchin mentioned.
I’m the only person on our network that opens the gateway webpage, so does it make sense that there would be 5-10 separate threads for webserver running?
Can you take some thread dumps when this happens? The traces posted by nminchin don’t have anything helpful in them. The CPU usage figure and the stack trace you’re seeing are not in lockstep, unfortunately, so it might take some lucky timing to capture actual activity on the threads.
I’ve finally got some time to grab some thread dumps. The CPU is now hanging around the 65% mark which I would really like to discover why! I do have a fair few tag change events. Is it possible to tell if these are the culprits from the dumps?
I’m having the same issue. We added more cores and memory to our server and are now seeing that high memory usage is causing our gateway to crash. We also have tag event scripts but the odd thing is we have a development gateway with the same code and not experiencing the same crashes like in our production ignition gateway
If you haven’t adjusted your ignition.conf to use G1GC instead of CMS, do that first. Seriously.
Also use the GC logging options until you have finished tuning.
I tried looking at the thread dumps to see if I’m able to get some more information on these gateway-shared-exec that are being blockedthread_dump (2).txt (101.2 KB)
I doubt the thread dumps will help. You have a memory leak. You should start with a class usage histogram captured with the jmap tool or equivalent. Compare that to a histogram captured before the memory consumption ramps up. Note that Ignition will stall for a few seconds while the histogram is captured in most cases.
The named query cache maintenance system got revamped (for performance) in 7.9.9. That’s the root cause of the issue.
You probably didn’t encounter the issue on your dev system because without active users you wouldn’t be building up a significant cache. Since 7.9.9 currently has an RC out, the release should be out within a couple weeks.
So will setting my name queries to stop caching when I create them help mitigate the issue I am currently having?
Is it possible to run the clearNamedQueryCache on the script console of my designer in order to clear the cache.
I believe disabling caching should help, but I am not very familiar with the caching mechanism so I’m not entirely sure what the change involved - only that it was specifically designed to fix this kind of issue.
A snapshot after reboot isn’t much help. A snapshot while the problem is occurring would be helpful. But also post your ignition.conf, as requested in the other post.
ps. Please don’t post in multiple topics for the same problem. This should all be in the new topic.
We have a relatively large project: 7.9.9, MES-heavy (Sepasoft), 4 sites running on a single Gateway with over 180 clients, 160 queries/sec, and 143 devices. The Gateway runs on an ESX Cluster Node, with 8 other VMs that use very little resources (our VM is by far the top consumer).
I have attached a copy of our ignition.conf file and a thread dump during a CPU spike.thread_dump.txt (619.2 KB) ignition.conf.txt (8.4 KB)
You have both CMS and G1GC set in your ignition.conf. G1GC is last, so that was chosen, but I recall reading that CMS settings can interfere with G1GC. Get rid of the wrapper.java.additional.1, .2, and .3. Add a target pause time. Consider logging garbage collector performance to determine if that is the cause of your clock drift warnings. See these topics: