Multiple OPC connections faulting at the same time

scotteolson3 · November 1, 2021, 9:54pm

I am having a persistent issue where my OPC connections all drop at the same time. Currently we have 8 connections, 4 of which will drop simultaneously and have to be manually reconnected (3 are intentionally deactivated and 1 is Ignitions default connection).

I had some issues relating to CPU/RAM capacity that I thought was causing the problem as the CPU usage would easily hit 90%+ usage, but after upgrading the cloud instance the problem persists (though less frequently).

I have logging for some of the OPC related Logger’s turned to DEBUG and capture events like this:

I do have the SessionFsm DEBUG info as well:

Which Logger/Loggers should I change the level of to better capture that event that is causing this? Any help in diagnosing the issue would be. It is occurring with both a Siemens device and multiple Beckhoff devices.

Kevin.Herron · November 1, 2021, 9:57pm

The SessionFsm logger is already a good start. It’s pretty clear that the disconnect is because the keep alive requests stop getting responded to.

A Wireshark capture would be a good supplement if you can disable security for the connections so we can decode the traffic.

What versions of Ignition are you using?

scotteolson3 · November 1, 2021, 10:00pm

Currently running on 8.1.7

I will work on getting a wireshark capture, that is a good idea.

The complexity of our system (and potential culprit) is we are running our gateway in the cloud and then connecting our local devices via a site-to-site VPN.

Kevin.Herron · November 1, 2021, 10:02pm

The way you’ve described it the obvious first guess is that you have a network problem, not an Ignition problem. The only 4 OPC UA connections that are talking to other hosts on the network all stop responding at the same time? Yeah…

scotteolson3 · November 1, 2021, 10:04pm

My thinking exactly. There are only 2 things that all 4 connections have in common, the gateway and the network which makes them the obvious suspects.

I think the gateway connection is a bit easier for me to diagnose/troubleshoot which is why I am starting here.

scotteolson3 · November 1, 2021, 10:38pm

I am capturing the traffic until they fault again. I will report back when they do!

Sometimes it’s once per day other times it’s every hour.

It is really frustrating to diagnose when there is no consistency.

Kevin.Herron · November 1, 2021, 10:45pm

Wireshark (and it’s command-line counterpart tshark) has good support for configuring traffic filters and for capturing into rolling file buffers, which makes it easier to leave a long-running capture running.

scotteolson3 · November 11, 2021, 6:38pm

Good news (or bad since the problem persists) - I was running a Wireshark capture when the connection went down.

The timing of the event somewhat corresponded to pushing a project update to the instance that is running this project. Would there be the possibility that the connection is dropping because of that?

Kevin, can I send the wireshark file to you in a DM?

Kevin.Herron · November 11, 2021, 6:40pm

Sure, try to DM it. If that doesn’t work I can get you a dropbox upload link.

scotteolson3 · November 11, 2021, 6:44pm

Ah, yeah 4MB limit and the file is 8.3MB. The dropbox link would be great

Kevin.Herron · November 11, 2021, 6:54pm

Can you tell me the IPs and ports involved? I’m not seeing any OPC UA traffic right away. Also not sure what the two different caps are.

scotteolson3 · November 11, 2021, 6:57pm

The 50.x.x.x:8088 is the Ignition instance while the 10.x.x.x is local traffic.

Importantly, our Ignition instance is hosted in the cloud and all the traffic is routed through a site-to-site VPN.

Kevin.Herron · November 11, 2021, 6:58pm

Sorry, it doesn’t look like you’ve managed to capture any of the traffic we need. I know with my local VPN connection it presents in Wireshark as its own network adapter that I can capture on. Not sure what your setup will look like.

There’s only traffic between 2 IPs in here, a 50.x and a 10.x, and none of it is OPC UA.

scotteolson3 · November 11, 2021, 6:59pm

Gotcha, I will expand the net and try again! Thanks Kevin

Kevin.Herron · November 11, 2021, 7:01pm

Oops, ignore my reference to captures plural, I had you upload into an existing folder that had an old capture in it already. But what I said above still applies to your capture.