Using jmap to diagnose heap related issues in Ignition?

Is there any chance of putting together a small, self-contained example of this problem?

If you suspect the problem has to do with the usage of sockets in Jython scripts in SFCs, then a simple example that reproduces the leak might be possible?

My choice was initially to keep the connection open until there was a reason to close it, such as an error condition or disconnect etc… This is typically the approach I use in all other environments. The manufacturer of the server side software informed me, after having issues with their interface, that I should create a new connection for each transaction and break it down when complete. So, I modified my client side to do so. Their server side did indeed seem to operate better when configured in this manner. So that’s how I got here, not by choice.

I’m not familiar with Netty?

In my function library I’m using the following imports:

import select
import socket

In mode code I’m using the following socket operations:

mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect((ipaddr,port))
mysock.setblocking(0)
mysock.sendall(cmd)
select.select()
mysock.recv()
mysock.shutdown(socket.SHUT_RDWR)
mysock.close()
mysock = None

I’m open to anything at this point because I don’t seem to be getting anywhere fast.

I do have a experiment running in the office that is using a callable SFC run off of a timer to accomplish the same thing. It’s been running for a week so far but it’s way to soon to see if it improved the situation. My thinking being that in the Always Run the collector wasn’t getting a chance to do a thorough cleanup.

Okay.

First thing I’d do is replace all the python socket stuff with the Java equivalents. There may be a jython bug there. As I noted to someone else recently, if there are both python and Java ways to do something in jython, always use the Java method.

I haven’t been down the java sockets path before but I guess it’s something I can try.

Kevin,

Thanks, this doesn’t seem to be exactly what we are seeing, while there does seem to be some long-time drift, we’re realizing that the bulk of the high memory useage seems to happen very rapidly, and it’s correlated with losing connection to an OPC Server. Don’t expect this to have relevance to you, but I mention it just in case.

Yes, we’ve also noticed that strange things happen within Ignition when heap usage rises.

We noticed today that on at least one of our installations the heap is being consumed as well and it isnt’ running the SFC that I’ve been concerned with. So there are apparently other heap gobbling culprits out there.

To be continued…

So, after exhausting many suggested paths to solving the mystery of disappearing heap memory it appears to have been solved. The actual root cause is still unknown at this time, but a work-around has been developed and seems to be effective.

Options suggested & tried which failed to remedy the issue:

  • Change garbage collection to G1GC
  • Increase maximum heap to allow more space for G1GC to operate within
  • Explicitly delete ALL Python socket-based items after use
  • Change Run Always SFC to callable SFC

The work-around that is currently in place, and functioning well, basically replaced ALL Python socket/select related items with Java equivalents and their associated classes.

The new function library that was created relies upon Java sockets, input streams, and output streams. All references to the Java socket items are explicitly removed as a precaution when no longer in use.

The functions within the new library are being called from within a Run Always SFC.

So, it would appear there is some type of an issue surrounding the implementation of the Python socket classes that is preventing the remnants from being cleaned up by garbage collection after their release.

4 Likes

Did you ever figure out the step function change in memory usage? We see it every now and then as well, and have never been able to figure it out. Only way of resolving it has been a restart of the Ignition service. We’re on 7.9.14

Unfortunately no. We most recently had a mem-use jump on Sept 8 2020. It seems to track issues with our PLCs, a power outage that knocked down some connections the gateway was expecting caused one instance.

We’re on V8.0.6, have changed our Garbage collection algorithm, and allow 16GB of heap space. Have not tried replacing Python sockets with Java

I recently went back to visit the site that I originally had the issue with and all seems to be in order. However, that is probably not of much help as it is still running under 7.9.7.

There is a very regular sawtooth trend created by the garbage G1C1 garbage collector in the heap memory graphic.