PLC stuck in Idle status periodically

cdowns · February 16, 2021, 11:01pm

Once or twice a day our Allen Bradley PLCs stop communicating and show “Idle” status on our development servers and we’re not sure why. We’re using version 8.1.1 at the moment.

On this project we have two tag servers, and one client server to deal with our high tag count. Initially in the project we were using one server, and it was getting overloaded constantly as we were adding tags, and the PLCs would commonly go into Idle status. Once we split out our servers however, the memory and CPU on the servers never get too high, and communications seem to be working decently well, but we still see devices dropping into “Idle” state and not coming back until the connection is refreshed through the gateway.

I’ve tried increasing and decreasing the Max Concurrent Requests, as well as the timeout parameters to no avail.

We’re actively developing on these PLCs, and sometimes during tag imports some PLCs go Idle, then back to Connected within a second or two, but sometimes they just get stuck in Idle.

Has anyone else had this problem?

pturmel · February 16, 2021, 11:36pm

Yes.

Though it was thought to be fixed in 8.1.1. One sufferer recently noted a possible correlation to online creation of program-scope tags. It is definitely correlated with runtime development on the affected PLCs. Also a possible correlation with v32 firmware.

When it occurs, it generally is fixable by restarting the Ignition device (go to edit the device config and just hit save).

Kevin.Herron · February 16, 2021, 11:53pm

We have not been able to reproduce this in house yet.

I am going to get with a QA person soon to explain the issue and then put out a call here on the forum for anybody who is experiencing this relatively frequently and can provide us with both their gateway backup and their PLC program or tag export so we can hopefully reproduce it. I suspect there needs to be a certain amount of real world comms load on the PLC while doing online edits to trigger it.

Paullys50 · February 17, 2021, 5:04pm

FWIW I haven't had this experience and I've performed a substantial amount of online PLC changes on v32 (L83, 80% memory used) over the past 12 months. Phase imports, Program imports, Rung imports, UDT/AOI imports, creation of global and program tags you name it I have probably done it. Various flavors of v8.0.x

cdowns · February 17, 2021, 5:33pm

Our PLCs have quite a few tags, and each import is generally pretty large, so that wouldn't surprise me that it's related to communications load.

@pturmel - That's what we've been doing to reset them. Whenever someone on the team says that their screens are dead/ unresponsive one of the Ignition admins goes on, opens the device configuration, and saves without making changes. Although sometimes I bump up the timeout, or play with the concurrent requests to see if it makes a difference although at this point it hasn't. We are running V32 software in the PLCs.

Edit: @Kevin.Herron this happens a couple times a day for us, so if you'd like to watch our system and watch us do a tag import or something we could probably schedule some testing time.

grietveld · June 23, 2021, 4:04pm

Bumping this thread. It is happening on our system(Ignition version 8.1.3 PLC version 32.012) now as well. Are there any updates on fixing this issue?

Kevin.Herron · June 23, 2021, 4:05pm

Sure, it was fixed as of 8.1.4.

rbachman · November 29, 2021, 6:31pm

We are encountering a similar issue today on the primary server in a redundant pair, and we are running 8.1.7. ALL of the PLC connections where we have pulled tags into the designer are showing Idle after a period of time. Even if I create a new connection, it goes online for a little bit, and then it goes idle. We have been running on our backup server all day as a result.

I don’t think it is related to the bug that was fixed with 8.1.4, but we’re at a loss for how to proceed. We have a ticket open with IA, but we haven’t gotten a response yet so I thought I would put something in the forum.

Edit: Also, I captured some packets on the server, and it appears that Ignition actually sends a FIN packet after I created a new connection, which caused the PLC to respond with a RST packet.

Also, if I open one of the device connections and hit Save the Gateway just sits and spins. At one point last night, I think I waited something like 20-30 minutes or more, and it never finished saving the connection. However, if I just click on devices while it is spinning, the connection shows up as if it had been saved.

Kevin.Herron · November 29, 2021, 6:40pm

Setting these loggers to DEBUG might help:

com.digitalpetri.enip.ChannelFsm
drivers.LogixDriver.LogixBrowseStateManager
com.inductiveautomation.xopc.drivers.logix.cip.CipConnectionPool

Also if you want to send me a Wireshark capture I can take a look.

rbachman · November 29, 2021, 6:51pm

That would be great! The Ignition server is 10.160.79.60, and it is trying to connect to a PLC at 10.160.72.1. There already was a device in the Gateway for this IP, but there was no traffic whatsoever even though we have hundreds of tags Ignition should be monitoring. For this capture, I added a new device at that IP two separate times.

Ignition Traffic to 10.160.72.1.pcapng (885.6 KB)

You should see a bunch of ARP traffic at the top, and then some pings when I made sure the network connection was working fine. Then you should see when I established the connection both times by seeing the TCP and CIP packets.

Edit: Also, I turned on DEBUG for the times you mentioned, but I can't upload the log files to this forum because they are too large.

Edit 2: Sorry, I just realized I put the wrong IP for the Ignition server in this post above. I have corrected it. The correct IP is 10.160.79.60.

Kevin.Herron · November 29, 2021, 7:15pm

I can’t fully explain what is going on in the capture. Twice, it shows the driver connecting, browsing, then a ~15 second delay before a ListIdentity request is issued followed by what appears to be the client disconnecting.

The ListIdentity is issued as a keep-alive when the connection is otherwise quiet, and although it appears to succeed in the Wireshark capture, it seems the driver thinks it failed for some reason. If you can turn the logger com.digitalpetri.enip.EtherNetIpClient to DEBUG it might show more information about why.

At this point the driver transitions to an “Idle” state, which means it’s waiting for something to try to use the connection before reconnecting.

That nothing is trying to use the connection seems to be a separate issue… do you have tags subscribed against this driver?

pturmel · November 29, 2021, 7:15pm

I am curious what you find, Kevin. The FIN comes immediately after a List Identity reply that contains CPF type code 0x0086.

Kevin.Herron · November 29, 2021, 7:17pm

Ah, I saw that, but now that you mention it I do remember having to add support for that item to the ENIP library. Need to see what version that happened in…

rbachman · November 29, 2021, 7:20pm

So we might be making progress. We have installed the trial version of a module for switching from class 3 to class 1 communication. The IA tech (Alex) who was looking at my coworkers screen indicated that something appeared to be blocking a thread, so we deleted that device (not the module) and restarted the gateway. So far, it appears that the gateway has maintained the connection to the PLCs.

pturmel · November 29, 2021, 7:20pm

Volume 2 Edition 1.27 April 2021. Not in Edition 1.25 Nov 2019. I’m missing Edition 1.26 for some reason.

rbachman · November 29, 2021, 7:21pm

For the new device, no, we did not have anything subscribed against it. However, we have hundreds or thousands of tags subscribed for many of the other PLCs that were listed as idle.

pturmel · November 29, 2021, 7:22pm

I would be very interested in a bug report for that class 1 module…

rbachman · November 29, 2021, 7:24pm

I think I can get that for you. I’ve downloaded the logs a couple of times. Also, here is a thread dump from when we were seeing the blocked threads.

EPWM-Ignition_thread_dump20211129-140214.zip (128.3 KB)

rbachman · November 29, 2021, 7:25pm

How would I go about getting the bug report to you?

Kevin.Herron · November 29, 2021, 7:25pm

It doesn’t look like the Logix driver uses a new enough version of the ENIP library to support that CIP security item in the ListIdentity response, and when coupled with a device with no tags or a long enough period with no activity the keep-alive kicking in and issuing a ListIdentity request is causing this.

Not sure about the other devices though.