Manage Driver Traffic on low bandwidth network

We are running Ignition 7.3.7 on a Windows 2008 server. We are communicating to AB MicroLogix PLCs over a low-bandwidth Ethernet network.

The symptom we are getting is that Ignition keeps losing its connection to the PLCs. The driver then reestablishes the connection and then has to restore all the tag values. Many times the connection is lost again before the tag values get restored. We have set the Connection Timeout in the driver to 30000 mS and higher with no apparent impact.

I believe that the real cause of the problem is that the device driver is saturating our network in spite of setting our scan classes to poll at a slow rate. Too much traffic saturates our network, increasing collisions, lost packets and lost connections.

How can we control the network traffic coming from Ignition?

The only thing I can think of is to reduce the impact the reconnect has on your network by disabling browsing in the MicroLogix driver settings (it is a setting… right? I’ll have to check on that and what version it’s present in).

Making your scanclasses slow and polled-read mode instead of subscribe mode may also help.

We’ve disabled browsing long ago. The scan classes are already polling at 60000mS.

We just switched to using polled-read instead of subscribe. We reduced the connection timeout to 15000mS and that increased the traffic.

I am considering using driven scans with the one-shot execution option and polled-read. What do you think?

What’s going to drive the scanclass? Do you not always need these tags to be polled?

Was planning to trigger the scan classes with a tag driven by a timer script.

Unless you’ve got some special logic in mind that will skip the poll on some of the executions of the timer script you aren’t doing anything different than a regular scanclass in polled-read mode.

Is there some way to know when a scan class has finished reading all its items?

Only on the Execution tab under Configure > Console on the gateway. You can see how long the average duration for the thread running each scanclass is taking.

If you are not already using Wireshark, it is a excellent tool to tune slow bandwidth connections. If the longest response time is 15 seconds, then the Ignition “Communication Timeout” for the device should be set to 20 seconds. If it is set too long it will just slow down reconnecting after a timeout.

See www.wireshark.org

Also, you may already be doing this, but minimized the number of requests being made to the processor by grouping all values together per scan rate. So instead of N7:0, N10:5, N10:50 (three separate requests) group then into one N15:0, N15:1, N15:2 (one request) and so on.

Hope this helps.

We have consolidated all the data to minimize the number of reads, we have grouped all the tags into one scan class per PLC to prevent the reads from being fragmented. We still get the same poor performance. Especially disturbing is the long time that it takes for the tags to recover following a reconnect.

I have Wireshark on my laptop but we need to install it on the Ignition server to review the traffic.

Can you screenshot the diagnostics page for (one of) the slow devices, and post it here? It’s a link next to the devices entry on the gateway.

Here’s the Diagnostics


This one is more representative


Ok, just FYI, there’s been a display bug on those diagnostics pages for a LONG time, only fixed in 7.5.1 or the 7.5.2 beta. I need to calculate what the actual times should be…

What are your Communication Timeout settings on each of these?

I might just need to get a beta 7.3.8 with the diagnostics stuff fixed…

Or you can use Wireshark to see.

Connection timeout is 20000 on first one. 60000 on second.

I take it back… diagnostics appear to be correct. I don’t think there is a display bug afterall.

At this point I’m not even sure what problem we’re trying to solve… so I’m going to recap the situation as I understand it, for once device.

You’ve got some tags in a 61,000ms scan class. This scan class is currently set to polled-read mode. The Communication Timeout for this device is set to 60,000ms. This is probably too high, since the diagnostics screenshot showed the max request duration to be ~3300ms.

Your real problem is that the network your on is constantly killing/closing the connection from the driver’s point of view. This causes your tags to go bad quality.

At this point a few things happen. First of all, the driver attempts to reconnect. This requires that it issues a connect request to get connected to the device. What happens next depends on your scan class mode. In subscribe mode the tags will immediately be subscribed when the device is connected and read. In polled-read mode your tags won’t get read until the next time the scan class executes. This is likely why your tags stay bad quality for a long time after the device disconnects, unless you happen to reconnect at a lucky point in the cycle. In a 61,000ms scan class this is already going to be painful, in a 1,000,000ms scan class like the other you posted you’re pretty much done for.

Given the propensity for your network to kill the connection, if you’re not willing to make your scan classes significantly faster I’m going to have to recommend you switch back to subscribed mode. The downside of this is increased “burstiness” in traffic upon reconnect, because now you’re not just sending a connect request but read requests as well. The upside is your tags come back to good quality faster after a reconnect.

Am I missing anything?

There are two problems as I see it.

First is controlling how much traffic we are sending to the network. When we overwhelm the network with polling for data, then the connections get dropped. While a single request may complete in 3 seconds or less, the network can not handle dozens of these requests simultaneously.

Second is the time for the tags to recover after the connection is lost. We understand that if we have a 60000mS scan, that the tags will not recover immediately. But after 5 or 6 multiples of the scan time we are still getting stale tag statuses.

I understand how longer timeouts increases the queue lengths, but whenever we shorten the timeouts the problems get worse, not better.

Here’s a topic on setting scan classes to prime numbers to reduce overall traffic. May be worth a look.

Also I’m curious for more background information.

[ul][li]What Micrologix hardware are you using? Do these have Ethernet built-in (like an 1100) or are you going through something like a 1761-NET-ENI?[/li]
[li]What else is communicating to the MLs over Ethernet?[/li][/ul]

My reason for asking as that while an ML1100 can have up to 16 simultaneous connections (well, 32, as it’s 16in/16out, but it’s the 16in that we’re looking at) the NET-ENI is limited to something like 4. With the limited number of connections and the NET-ENI and the serial connection bottleneck after that, times the number of processors, and you have a bit of a mess.

I’ve just used one also, so I guess this is an “also-also”, don’t discount the somewhat unlikely, but real possibility of something jacked up on your infrastructure. I had a switch go bad here at the shop that was tucked away in some cabinet or other, but collisions were happening everywhere. :unamused: Like I said, it’s an unlikely scenario, but also worth a check.