Since we rolled out v8.1.0 and Perspective to one of our largest clients (280k tags, 70 Vision clients, + now 8 Perspective clients running inside of Vision’s Web Browser component), we’ve had non-stop issues with extremely high server CPU usage (90-100%) and incredibly slow clients (due to server CPU). We’ve been in contact with support, however have yet to find a reliable cause and solution. We removed all Alarm Status Table components from the Perspective Views as these seemed to be causing excessive and persistent perspective-worker threads (we had 1050 threads for 8 perspective session at one stage!!) Removing these has reduced this number to ~50-100 and we thought it had identified the issue as we were consistently averaging ~25% (min 16 max ~45) for 3 days but it’s recently increased to 40-50%. Unsure the reason.
Anyway, is anyone else experiencing any of these issues? Or is it just me…? Did you resolve them?
I think @PGriffith has said that that the perspective clients actually scale much better than vision clients. You didn’t have any issues prior to perspective, performance wise?
The issue as I see it as, as nice as it is that you can develop and push from Ignition for both Vision and Perspective clients (and don’t get me wrong, it is very nice), it does come at a cost - the Ignition server is acting as both backend and frontend servers (for perspective at least, I think with Vision its more of a traditional Gateway=Backend, Vision Client=Frontend that handles itself).
Do you have a lot of polling, or timer scripts or anything similar on Perspective? Remember, all scripts etc on Perspective are run on the gateway, not the browser/client. So having a very ineffecient script on some view, x8, now you’re gateway is trying to handle all of that. Now imagine if that’s being run on a timer. Do you have anything that might meet that criteria?
I wish I could help you more. I have my own issues with perspective. I really love Ignition and think Perspective will get there in time, but atm, if I was asked to create a web front end to be used in a professional setting, I would set up a Web Dev module to create a backend Ignition API and create a separate front end server, to help take the load off of the gateway.
system.util.threadDump on the gateway (attempts) to capture per-thread CPU information; you could set that up to automatically trigger every few minutes when CPU load gets high and log those somewhere. It would be interesting to see the culprits in those threads.
I experimented that lots of alarms, 10K for example with a too high pooling rate for alarms status or a to large range for alarm journal can cause vision client freeze or gateway CPU very High usage for perspective client.
IMO, too many alarms is not a common use case in production but can sometimes occur at startup or in other case. Displaying too many alarms has no interest but can lead to client or gateway very high CPU load.
IMO, a good solution for Ignition stability would be to be able to add a LIMIT number for the alarm displayed in alarm status and alarm journal in vision and perspective, with a flag to indicate that alarms results had been truncated.
Yeah, but these don’t have the CPU information on them - that’s probably the most useful diagnostic axis. It’s impossible to say for sure what’s most “expensive” based on just the stack trace.
How do I get these? (thread dumps have been the only logs requested by support)
Thread dumps captured via
system.util.threadDump() will have CPU information. I have no idea why the other output formats don’t, but it is on our agenda to fix.
Awesome, i’ll get onto it ASAP. Cheers! This is causing massive havoc!!
Wanna try something a little different?
Add this additional parameter to ignition.conf and then restart. After 5 minutes get the
ignition.jfr file that got created (in wherever
$IGNITION is I think) and upload/DM it to me.
Will do! Cheers, will just need to coordinate with site but shouldn’t take long.
I think you’ll need to be sure and wait the full 5 minutes. If you copy that file early I may not be able to read it.
Also, do I need to remove that after and reboot? or ok to leave?
It’s supposed to stop recording after 5 minutes (300s in the line), but… this is somewhat new to me. Can’t verify that there is no performance hit after it stops recording.
and hopefully 5 minutes is enough to get past startup and into your application / Perspective sessions bogging everything down! Otherwise we might need to extend that.
One this to note, when we started seeing the CPU average go up again a couple days ago to ~50%, we added another ignition gateway to the gateway network as we wanted to use EAM with it to synch up changes. We originally had no network gateway setup prior to that and no EAM module installed. Using trial EAM on main GW and licensed edge EAM on the other (we have a quote and in middle of purchasing EAM for main GW). I disabled the GW network an hour ago to see if any change.
I set for 400s just in case. Ignition rebooted and counting. ~7mins and i’ll get the log to you, cheers!
Also, here’s a thread dump using the scripting command. I don’t see any CPU usage in there unless I’m not reading it right thread_dump_20210311_0909.txt (39.7 KB)
Edit: scratch that, it’s showing up now
Well that was an interesting experiment but too distorted by all the things that happen when you startup a real gateway…
You do seem to have a lot of alarm status query stuff happening all the time though. Those showed up in the thread dumps too. But I’m not sure if that’s related to the growth of Perspective worker threads and overall CPU usage growth that came with adding Perspective…