Using jmap to diagnose heap related issues in Ignition?

Try running them from a console started as administrator. I don’t know why else they wouldn’t work and I’m not really familiar enough with Windows to have any other guesses.

FWIW, when you run jps Ignition may show up as “WrapperSimpleApp”.

When I initiate jps from the command line on the Windows Server what I get appears to be the pid of jps and nothing else.

image

Hmm. Probably something to do with Windows and Ignition running as a service :confused:

I wonder if you’d have better luck with a tool like VisualVM

This thread suggests some potential workarounds… I guess it is a bit more difficult on Windows.

1 Like

I went and looked at the suggested thread. Seems like an entire project just to get java utilities to run under windows. Thanks for the suggestion though. Haven’t gone down the patch of VisualVM. Not sure what it is and what is required to install it. The target system is in production and will not tolerate a restart so if a system restart is required I can’t go down that path.

Another issue I might be encountering is that my JDK is not the identical version of the runtime environment version. It’s actually a bit higher version.

I wanted to get back to you on your jmap suggestion. I did finally get a heap dump using jmap. It just took some experimentation. I was able to use netstat - anon to locate the PID I needed to give jmap. The key is that you need your windows cmd window to be running as administrator. I used the command jmap -dump:file=C:\your path here\heapdump.bin - to generate the dump file which takes a lot of disk space. Basically the same size as your heap. Using jhat to analyze the file also took some experimentation. It turns out that jhat require a HUGE heap space for the analysis. By adding the -J-mx16G on the jhat command to give it 16 GB of heap it was finally able to analyze the file and generate its output. For a 4 GB heap it took nearly 30 minutes to generate the results.

Thanks for pointing me in the right direction.

Still working through support to locate the heap consumer.

Kevin,

What you describe is exactly what we’re dealing with over here. Windows server, gateway slowly takes more and more memory - though sometimes we’ve seen rapid changes to a much higher use regime, that stays essentially static.

Were you able to locate the problem, and were you able to fix it, and how, in both cases?

Unfortunately no, we have not gotten to a conclusion at this time. I’ve had a case open with Ignition support since mid-June on this issue. At first we were directed to change our garbage collector to G1GC and to increase the heap from 2 GB to 4 GB. It did change the shape of the heap memory trend a bit but did nothing for the original problem. Increasing heap space did nothing but extended the period between gateway restarts.

I’ve gotten little else in response. I will occasionally get a response if I ping support asking for status as this has been ongoing now for 2 1/2 months.

I’ve sent multiple heap dumps from jmap.

Last contact I tried to see if there was a way to escalate the case due to its longevity. I was told there is no escalation process in place. I was also told that I would get better response if I called rather than exchanging email. I don’t understand that but keep it in mind if you open a case in the future.

I’m currently experimenting with the notion that if I get away from the Run Always SFC that possibly the garbage collector might be able to do a better job of cleanup. The jury is still out on that experiment as I’ve only had the new solution in operation for 4 days in our office and it takes longer than that do tell if there is an improvement.

I certainly would like to hear that somebody is actively investigating this issue.

My particular application is socket based. It is using the socket functionality of python/java. I’ve read a number of articles that seem to indicate that special handling is required for the byte buffers used by the java socket implementation. I was wondering if that might be where my problem lies. The heap dump show a huge allocation for java socket related classes.

That might explain support's inability to help. If you are rolling your own sockets, you are responsible for all of the related object lifetimes, many of which require explicit close operations to clean up. (Every single socket opened should be managed by a single thread that guarantees closure with a try-finally construct.)

I have taken every action imaginable for destruction/disposing of these socket related items. The connection is being shutdown, closed, and deleted on every use.

Does this mean you are opening these connections often? Does the protocol require new connections often? Is there any reason you aren't using a long-lived socket with an assigned thread? Are you using Netty and not explicitly releasing its ByteBufs?

Is there any chance of putting together a small, self-contained example of this problem?

If you suspect the problem has to do with the usage of sockets in Jython scripts in SFCs, then a simple example that reproduces the leak might be possible?

My choice was initially to keep the connection open until there was a reason to close it, such as an error condition or disconnect etc… This is typically the approach I use in all other environments. The manufacturer of the server side software informed me, after having issues with their interface, that I should create a new connection for each transaction and break it down when complete. So, I modified my client side to do so. Their server side did indeed seem to operate better when configured in this manner. So that’s how I got here, not by choice.

I’m not familiar with Netty?

In my function library I’m using the following imports:

import select
import socket

In mode code I’m using the following socket operations:

mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect((ipaddr,port))
mysock.setblocking(0)
mysock.sendall(cmd)
select.select()
mysock.recv()
mysock.shutdown(socket.SHUT_RDWR)
mysock.close()
mysock = None

I’m open to anything at this point because I don’t seem to be getting anywhere fast.

I do have a experiment running in the office that is using a callable SFC run off of a timer to accomplish the same thing. It’s been running for a week so far but it’s way to soon to see if it improved the situation. My thinking being that in the Always Run the collector wasn’t getting a chance to do a thorough cleanup.

Okay.

First thing I’d do is replace all the python socket stuff with the Java equivalents. There may be a jython bug there. As I noted to someone else recently, if there are both python and Java ways to do something in jython, always use the Java method.

I haven’t been down the java sockets path before but I guess it’s something I can try.

Kevin,

Thanks, this doesn’t seem to be exactly what we are seeing, while there does seem to be some long-time drift, we’re realizing that the bulk of the high memory useage seems to happen very rapidly, and it’s correlated with losing connection to an OPC Server. Don’t expect this to have relevance to you, but I mention it just in case.

Yes, we’ve also noticed that strange things happen within Ignition when heap usage rises.

We noticed today that on at least one of our installations the heap is being consumed as well and it isnt’ running the SFC that I’ve been concerned with. So there are apparently other heap gobbling culprits out there.

To be continued…

So, after exhausting many suggested paths to solving the mystery of disappearing heap memory it appears to have been solved. The actual root cause is still unknown at this time, but a work-around has been developed and seems to be effective.

Options suggested & tried which failed to remedy the issue:

  • Change garbage collection to G1GC
  • Increase maximum heap to allow more space for G1GC to operate within
  • Explicitly delete ALL Python socket-based items after use
  • Change Run Always SFC to callable SFC

The work-around that is currently in place, and functioning well, basically replaced ALL Python socket/select related items with Java equivalents and their associated classes.

The new function library that was created relies upon Java sockets, input streams, and output streams. All references to the Java socket items are explicitly removed as a precaution when no longer in use.

The functions within the new library are being called from within a Run Always SFC.

So, it would appear there is some type of an issue surrounding the implementation of the Python socket classes that is preventing the remnants from being cleaned up by garbage collection after their release.

4 Likes