Gateway reboot itself. What is the reason?

Michal_Goralczyk · December 28, 2022, 11:07am

Hello,
a while ago my Gateway Ignition spontaneously rebooted what I suspected in the logs I am posting below. What was the reason for the Gateway reboot?

What it means:
Clock drift, degraded performance, or pause-the-world detected. Max allowed deviation=1000ms, actual deviation=1864ms

Why did I experience such deficiencies in the performance graphs?

What is the reason for the LicenseManager license error:

Best regards,
Michal

pturmel · December 28, 2022, 1:30pm

I don't see anything obvious in what you've posted. That sequence of clock drift errors suggests that something external is starving Ignition of CPU time or interfering with its use of memory. Is this a VM? Is it overcommitted in any way? Does the machine have 40G or so of RAM allocated? (Extrapolating from an Ignition heap size of 24G.)

Michal_Goralczyk · December 28, 2022, 2:28pm

That's right, we use a virtual machine. On the IO server we are starting to have problems reading the tags which is why I am going into detail. Theoretically as far as I know the ability to increment the tags on the IO gateway is only limited by performance but I wonder if this is true because there are times when the gateway needs to be restarted or it does this on its own I'm trying to figure out the cause.

For example:
During tag reads, there are situations when tags take the value Uncertain despite the fact that there is communication with devices in Ignition as well as Kep server. After this reboot, everything returned to normal and will continue to work for some time until the problem starts to grow.

In the last screen shot you can see the history provider error which also appeared two days ago and disappeared after the reboot.

How can I diagnose the cause of clock drift eliminate it?
The VM is only burdened by the IO-gateway for reading and handling tags, the VM is dedicated entirely to this purpose. As you can see, the load is at a reasonable level - so it seems to me both CPU and RAM. The resources of the VM itself have also been allocated well in excess. From what we have been able to find out the maximum reasonable RAM allocation for Ignition is some 32GB of RAM. I'm having a hard time understanding the problems that are occurring and removing the cause .

IO gateway has about 700k tags, about 500 Ignition OPC devices, 1 kepserver and 40 OPC server as PLC reading.

Best regards,
Michał Góralczyk

pturmel · December 28, 2022, 4:32pm

OK, but you didn't answer this:

"Overcommitted" is when the physical hardware has "X" resources and the VMs in that hardware add up to "X" or more. CPUs and RAM in particular. When a physical host for VMs has too many VMs defined, then VMs get starved a bit at a time. The shuffling of overcommitted resources can create brief halts in the VMs even though the VM thinks its CPU usage is low.

How do you know this? What is the ram allocation for the VM? (Ignition should not be told to use all of it. Maybe half.) Also, make sure Ignition claims all of its RAM up front--initial heap and max heap should be equal in ignition.conf. This ensures that the VM's OS doesn't cause random stalls when Ignition asks for more RAM.

Consider not giving Ignition quite so much RAM. From your trend, it looks like 14G to 16G should be fine. (I recommend striving for a sawtooth pattern that cycles in the 50% to 80% range.)

Consider setting a maximum GC pause target in your ignition.conf file, if you haven't already. I usually use 100ms. This minimizes the chance that GC will produce clock drift warnings.

pturmel · December 28, 2022, 4:54pm

Oh, and I'm not sure about the history provider log entries, but that is usually a database issue, not an Ignition issue. Unless the DB is also in the same VM, or it is in a VM on the same overcommitted hypervisor.

Michal_Goralczyk · December 28, 2022, 5:55pm

Thank you for your reply.

We have one large physical hardware server on which virtual environments work, such as Ignition Frontend, Ignition IO server, potgreSQL database, reporting services etc. I check the right amount of resources using CPU and RAM performance charts. They seem to be sufficient for the intended purpose on any VM. I think it was foreseen because I always get information whether we have any resources left for virtual machines. However, what you wrote is very interesting, I will try to consult it with the department responsible for the operation of the server room and virtual machines.

When our IO server project was growing and the demand for resources increased, at some point we decided to allocate a large amount of resources so as not to do it too often. I decided that a large surplus of resources has a positive effect, and if not, it certainly can't hurt.

The VM has 36GB RAM Ignition is configured as below:

# Initial Java Heap Size (in MB)
wrapper.java.initmemory=8196

# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=24576

I can try to do that, but it seems odd that fewer resources might be better than too many resources.

I'll try to find it and test it.

The database is on a separate virtual machine. Oddly enough, everything was working fine until a certain point and a problem occurred which was fixed by restarting the gateway.

pturmel · December 28, 2022, 6:03pm

Not true, at least for RAM. Extra CPUs is good.

Michal_Goralczyk · December 28, 2022, 6:33pm

I will try to use your advice thank you for your help and knowledge