CIP Connection Pool alarm

I have a solid stream of alarms that look just like this:

java.lang.Exception: senderContext=478611 timed out waiting 2000ms for response
at com.digitalpetri.enip.EtherNetIpClient.lambda$writeCommand$7(EtherNetIpClient.java:147)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:656)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
at java.lang.Thread.run(Unknown Source)

Anybody run into anything like this?

Two likely causes:

  • Actual hardware disruption preventing network traffic or PLC message processing.
  • Gateway server load spike or garbage collection event that delayed packet reception longer than two seconds. Look if your log also has ClockDriftDetector events.

Didn’t see any ClockDriftDetector events in the Log

Look at the driver diagnostics for this device. What load do you show there, and what are the average and worst-case response times?

Not to be a negative Norton here, but I can’t even tell WHICH device is having the trouble, or if there are mutliple devices having the problem. The error, you will note, doesn’t tell us from whence it came.

Its the v21+ Logix driver. Go to the status page for each one of those and check the related log entries there.

The gateway that is generating these alarms is version 7.7.4, so I may not have the logs you are thinking of. Knowing that it is the Logix driver V21+ helps considerably, though. I only have 5 devices of that type.
Under Status–> Devices, all of them merely say “Connected”.
Under Configure --> Devices --> Diagnostics, there are a plethora of registers on throughput.
Anything I should be looking at in particular?

Eww. Consider upgrading at least to 7.7.8.

The critical information is whether your scan classes are overloading the connection. I don't recall if v7.7.4 computes this for you. The other is the average response time. Single digit milliseconds is what you should have there. The per-driver logging attributes that would make it easy to tell which device is the problem is a v7.9 feature.

“Eww”?? Well, I got my laugh for the day!
The intention is to move the Gateway up to 7.9.6 at the end of the year.
Here is one of my devices:

aggregate-request-count
Value 16
aggregate-throughput
50thPercentile 73.22498399999999
75thPercentile 80.11576875
95thPercentile 107.61096029999982
98thPercentile 373.50455045999996
999thPercentile 662.747547069
99thPercentile 386.86215403
Count 3132016
DurationUnit milliseconds
FifteenMinuteRate 5.713554041059734
FiveMinuteRate 5.718529133599528
Max 662.805119
Mean 84.82985390175097
MeanRate 3.5552426114487723
Min 50.375536999999994
OneMinuteRate 5.757286083051503
RateUnit events/second
StdDev 61.62825117739546
request-count@2901
Value 16
request-throughput@2901ms
50thPercentile 73.0380885
75thPercentile 79.9987495
95thPercentile 99.5450289999999
98thPercentile 371.74777009999997
999thPercentile 662.7098915739999
99thPercentile 385.61118235000015
Count 2946547
DurationUnit milliseconds
FifteenMinuteRate 5.507278648398961
FiveMinuteRate 5.499619463143251
Max 662.767265
Mean 83.81745323638131
MeanRate 3.3447454399991674
Min 49.481079
OneMinuteRate 5.46266236875047
RateUnit events/second
StdDev 60.14797599774047
request-throughput@5000ms
50thPercentile 62.2052875
75thPercentile 69.25585149999999
95thPercentile 81.50665084999999
98thPercentile 109.45568383999988
999thPercentile 732.3792339950006
99thPercentile 209.00108386000014
Count 175888
DurationUnit milliseconds
FifteenMinuteRate 0.19993120608707532
FiveMinuteRate 0.1999997700942866
Max 737.147598
Mean 67.66569243968871
MeanRate 0.19965551711687782
Min 48.428362
OneMinuteRate 0.19999999999999984
RateUnit events/second
StdDev 39.79458301659297
tag-reads
Count 26432217
FifteenMinuteRate 52.03068307752531
FiveMinuteRate 52.07610472861599
MeanRate 30.00395511423497
OneMinuteRate 52.41853424589911
RateUnit events/second
unscheduled-request-throughput
50thPercentile 62.242827
75thPercentile 72.31036025
95thPercentile 88.20593509999999
98thPercentile 685.3266994200013
999thPercentile 1155.288172
99thPercentile 1155.288172
Count 7643
DurationUnit milliseconds
FifteenMinuteRate 6.719842577141861E-5
FiveMinuteRate 4.2178973528388305E-7
Max 1155.288172
Mean 79.60983601282051
MeanRate 0.008676072373701042
Min 47.375295
OneMinuteRate 5.407661902490511E-22
RateUnit events/second
StdDev 128.0180002576787

What’s your ping time from the gateway to the PLC? Because those are horrible response times. Overloading the device.