Large, Random Lag Spikes. Related to Server Configuration?

Thomas.C · August 25, 2021, 2:25pm

Hi there,

I have a question in regards to some issues we've been experiencing. We have a server running around 10 different projects and controlling Vision clients on various machines. We've been noticing some pretty sever lag spikes around the facility (sometimes 20+ seconds). We haven't had any of the HMI screens come up with Gateway errors, it just seems to be weird behavior with the HMI screens themselves. It's not related to a specific project. We've also checked PLC load, client hardware and network traffic and none of those seem to be related to the issue. I think at this point, this post is my only clue:

The server we are using has mechanical SCSI drives in it running in RAID 1. Is it a possibility that the mechanical drives are the cause of this issue we've been experiencing? I've checked drive health, and all seems to be good, but because project files are stored on the drive, it seems to be the only thing I can really think of.

Would anybody have any suggestions?

paul-griffith · August 25, 2021, 3:14pm

In 8.0, the project resources system scans the filesystem for changes at a fairly aggressive rate, which often leads to significant CPU usage (and other negative effects on the side). This could be the root cause of what you’re experiencing, especially with mechanical hard disks involved.

The easiest fix is to upgrade to any 8.1 version, where we relaxed that check to a 5 minute rate unless you opt out via a system property. I don’t think you can change that scan frequency on 8.0.X at all, unfortunately…

EDIT: Actually, it looks like it should be doable on 8.0.17; add a wrapper.java.additional parameter to the ignition.conf file, like this:
wrapper.java.additional.X=-Dignition.projects.scanFrequency=300

Thomas.C · August 25, 2021, 4:43pm

That’s perfect. Apologies, I think I tagged this post incorrectly, I’m on Version 8.1.0, whereabouts would I be able to find the opt out property? Or should I use the wrapper still?

As an additional tidbit, it turns out the config.idb was set to scan and potentially backup every 2 minutes, so adjusting this should help as well I think. The .idb file is about 145MB, so when it did back up it was taking around 10-25 seconds. Could lead to the spikes I’m seeing as well.

paul-griffith · August 25, 2021, 4:52pm

If you’re on 8.1.0, then you’re already set to a 5 minute scan rate, so that’s unlikely to help.
I would recommend updating to latest 8.1.X as soon as possible, though. There’s been a ton of performance (and other) improvements since 8.1.0.

The autobackup mechanism for the idb should be mostly asynchronous - a 145mB database is pretty unusual in 8.0+, since projects are no longer in the IDB. Do you already have a support ticket going for this? I’d be curious what’s taking up that much space in the file. (You can find out yourself using the sqlite_tools available on their website: SQLite Download Page, but it’s usually easier to have support do it).

Thomas.C · August 25, 2021, 5:06pm

Ok that’s good to know, thank you. I’ll do the update to newest 8.1 version as soon as possible. It’s a production server, so I am unsure as to when I will be able to do this.

I did have a support ticket open for this issue, I spoke to somebody earlier who did point out the .idb file size is large, but didn’t say it was anything unusual, just that we have a larger than average project.

We use SQLite here frequently, so I already have the tool installed actually. Looking through the database, it’s actually pretty surprising how large the file is for the table sizes. Largest table I have is just over 15K entries for the ‘SQLTAGPROP’ table, and just over 12K for the ‘TAGCONFIG’ table. There’s a few other tables that are in the thousands, but the files size does seem large for the amount of entries we have.

paul-griffith · August 25, 2021, 5:11pm

Hmm, what does sqlite_analyzer say for disk usage per table? The (new) tag tables are just JSON strings - while they can get relatively large for complex UDTs, it’s still hard to see how that would reach 145 mB.

Thomas.C · August 25, 2021, 5:39pm

tagconfig
Here’s the largest table we have. 5.0% of the total database. Doesn’t seem too unreasonable.

EDIT: Apparently I missed one, project resources. That seems like a problem, considering it only has 3100 entries.

paul-griffith · August 25, 2021, 5:50pm

Well, the good news is PROJECT_RESOURCES is totally ignored in 8.0+, so it should be safe to remove it. You’ll need to do some surgery on a gateway backup - you can open the .gwbk as a zip, extract the config.idb, drop the PROJECT_RESOURCES table via sqlite, then pack it all back together.

Obviously, keep an unmodified .gwbk around to restore back to just in case, but that should be a pretty safe operation if you’re familiar with SQLite. I can’t guarantee it’ll do anything about your core CPU usage issue, but it might be interesting to try separate from the upgrade.

Thomas.C · August 25, 2021, 6:08pm

Ok that’s perfect! I dropped the table and compacted the database, and we dropped down to 14MB. Waaaayyyyy better.

So next sever maintenance day, it looks like I have a few things to do.

Drop the table on the live server and compact the database.
Adjust the backup times to be a little further apart.
Rebuild the server storage using SSDs instead of HDDs.
Upgrade to Ignition 8.1.X (newest)

Hoping that will solve my issues. Did I miss anything or does that look good to you?

paul-griffith · August 25, 2021, 6:12pm

Seems solid to me. As with any upgrade, make sure you’ve got a .gwbk from before you start, but it should be pretty painless. 8.1.10 is due out ~early September, if that influences your timing at all.

Thomas.C · August 25, 2021, 6:18pm

Perfect, that’s good to know as well. Thank you for your help!

Kevin.Herron · August 25, 2021, 6:30pm

Did you ever upload your logs somewhere?

Thomas.C · August 25, 2021, 7:03pm

No, I didn’t end up uploading logs. I had a support agent set up a remote session, and he looked at them without seeing anything out of the ordinary, other than the time it was taking to backup the .idb file.

Thomas.C · September 1, 2021, 5:09pm

Back for a follow up. Over the past weekend, I dropped the table, adjusted the backup times and upgraded us to 8.1.9. Operators have said they have seen a BIG jump in performance. So looks like it’s almost entirely solved!

I wasn’t able to complete the SSD upgrade as our current server has SAS drives, and I don’t have the budget to replace those with SAS SSDs, so I will need to create a new array, but I believe that we nailed the problem. Thank you very much for your help!