Migrate Ignition Server to Azure - High CPU on VM from Java(TM) Platform SE Binary Processes

jlopez · July 16, 2019, 7:44pm

I moved a production Ignition server and projects over to a VM on Azure using a gateway backup. The Azure VM is the same size as the on-prem VM, which Ignition previously resided on (16GB RAM, 2 core, 128GB). The gateway CPU and memory allocation looks fine (see below). I am running Windows 10, Ignition 7.9.10 and Java Version 8 Update 221. However, I have the clock drift issue which seems to be causing issues at the client level (8 open clients), that have reported slow and laggy updates on power tables. This was not an issue with the on-prem VM. Open further investigation, I see through the task manager that the CPU of VM is at 100% caused by several instances of Java™ Platform SE Binary processes. I don’t want to upsize the VM yet without first understanding this issue. I changed the initial and max heap sizes in the ignition.config file to 2048 and 4096, respectively, with not change in performance. Any help would be appreciated.

Also was reading this article:
https://support.inductiveautomation.com/index.php?/Knowledgebase/Article/View/92/2/clockdriftdetector-warnings-in-the-gateway-console

Kevin.Herron · July 16, 2019, 7:46pm

Are we looking at the same screenshot? More than half the cpu usage is your MySQL database…

jlopez · July 16, 2019, 7:47pm

Yes I realize that, which is separate issue I am working on understanding, but the other 50% is from (1) of the Java binaries. Does this correspond to one the clients being the issue?

Kevin.Herron · July 16, 2019, 7:50pm

Are the clients remote or running on the same VM?

jlopez · July 16, 2019, 7:51pm

They are remote:

Kevin.Herron · July 16, 2019, 7:52pm

Then that JVM instance might be the Ignition Gateway, I don’t know how to get more details from Task Manager.

If the load is spiky like you’re seeing it might correspond with large history queries, which could also explain the DB load.

You might be able to catch some CPU usage on the threads page of the Ignition gateway.

jlopez · July 16, 2019, 8:01pm

The problem instance is listed under Background Processes, the instance under Apps shows 0%. Would the instance under Apps be the Gateway or not necessarily?

Kevin.Herron · July 16, 2019, 8:03pm

I’d imagine the gateway is the one under “Background process” since it runs as a service but I don’t know for sure.

jlopez · July 16, 2019, 8:07pm

Thanks, under threads doesn’t appear to be any at the order of magnitude to be causing an issue:

bpreston · July 16, 2019, 8:22pm

I’m sure its something you’ve thought of but if it worked fine with a server on-prem but is giving issues after moving it to a cloud server. How is your internet connection? If you have any issues with your connection or anything slowing your connection down, could that be your issue? If your facility is using a lot of bandwidth over your connection could that be causing lag on your connection to Azure? If everything worked fine on a local server but is giving issues on a cloud server that’s set up the same way, I would think it may be related to your connection to that server but that’s just an initial thought not knowing anything about your connection.

Even without that though I’d get the MySQL moved to a separate machine, even if temporarily for troubleshooting, so you can take that load out of the equation to see how your CPU usage responds. With it being at 100% I would expect to see clock drifts even if your not doing much with the gateway.

jlopez · July 16, 2019, 8:44pm

Didn’t think about that, but not sure about the best way of measuring this. VZ speed test from my local machine:

Ping Azure server: Avg round trip 18ms
Ping On-prem server: Avg round trip 0ms

jpark · July 17, 2019, 2:32am

What is the innodbbufferpoolsize on the mysql? Did you change it from the default?

jlopez · July 17, 2019, 2:15pm

It is currently set to 268435456. I am running MySQL 5.7.24.

I changed it to 7GB following this:
https://dba.stackexchange.com/questions/27328/how-large-should-be-mysql-innodb-buffer-pool-size

I restarted the MySQL service after updating the innodb_buffer_pool_size, which reduced the buffer usage but it did not change the CPU consumption of the VM

So I stopped the MySQL service but the JVM is still very spikey ranging from 6% to 60%

jlopez · July 17, 2019, 7:13pm

Just got off of the phone with support, looks like it is a provisioning issue with the number of cores allocated to the VM (host is 2 core while guest is only 1). Put in a request to bump it up to 4 cores. Hoping this resolves the issue, will post back when the change is made.

francois.morin · July 18, 2019, 11:54am

You might want to compare the ignition.conf from your on-prem setup.
The Garbage Collector and Memory usage config is probably different.

look for something like this

Java Additional Parameters

wrapper.java.additional.3=-XX:+UseConcMarkSweepGC

Initial Java Heap Size (in MB)

wrapper.java.initmemory=1024

Maximum Java Heap Size (in MB)

wrapper.java.maxmemory=3072"

jlopez · July 18, 2019, 1:50pm

They were initially the same, but I increased the initial and max heap sizes on the azure instance
On-prem:

Azure cloud:

pturmel · July 19, 2019, 2:17am

You really – really – don’t want to be using the Concurrent Mark and Sweep Garbage Collector. Ancient algorithm with poor performance. Search this forum for “G1GC” and follow the recommendations.

jlopez · July 19, 2019, 6:56pm

Thanks, I switched to the G1GC parameters for the ignition.conf file, but not quite sure I understand the changes that need to be made on the client side. I can either limit the allowed JREs to 8+ in the gateway settings as shown here?

Or add an argument to the native client launcher, where can this be done? And does this need to be done on a client by client basis? I have new clients go to the gateway homepage, launch the project then open the file that creates desktop shortcut (i.e. I have native client launcher disabled).

PGriffith · July 19, 2019, 7:13pm

There’s almost never a reason to go through the hassle of enabling G1GC on clients - they don’t run for long enough, or with enough memory dedicated, to run into significant problems.

witman · July 19, 2019, 7:20pm

Yes, but you'd need to limit it to 9+ if you wanted to force G1GC on clients (doesn't affect gateway) EDIT: this applies to forcing G1GC without using the jvm argument (you don't need Java 9 to use the jvm argument).

Yes, add jvm-args="-XX:+UseG1GC" to the end of the shortcut target like this:

Yes. But I agree with @PGriffith that it's usually not needed. I only add it for auto-launched clients on machines within minimal available memory or when I happen to be setting up a link for someone.