Scan Class Best Practices - Having intermittent comm issues

Paullys50 · June 30, 2014, 3:22pm

Looking for some advice as I am having intermittent communication problems with my PLC.

Current Setup:

PLC: Allen Bradley 1756-L74, 1756-EN2T Ethernet card. Firmware v20.

This system is development, with little overhead other than background processes and communications to Ignition. Not talking to any IO points, or running any programs. Just enough to interact with the Ignition project. This is in our office on an isolated network so there shouldn’t be any influence there.

Ignition v7.6.6
Using the Ignition OPC-UA Driver for ControlLogix communications.

The system is running on Windows Server 2012 Essentials R2.

Currently there is a a single Scan Class, setup as “leased” with the slow rate = 5,000 ms and the fast rate = 1,000 ms. I also have a “historical” scan class but right now it’s not logging any data points.

The “fast” rate currently has 11,457 items on scan.

The “slow” rate currently has 58,824 items on scan.

These numbers change obviously with the screens that are open in Ignition due to the “leased” nature, but the “fast” will increase as I add clients to the system. Initially I will have 5 clients, each could be on different screens, thus my “fast” rate tag count will increase. I’m concerned that I am already seeing intermittent communications issues. I don’t feel that I’m asking to much of the system, but maybe I am? This project will also be talking to a 2nd PLC, a CompactLogix in the short future, so more tags will be required. 85,000 total tags is realistic, more in the future as this is just “Phase 1” of this project.

The number of tags I have is “bloated” due to them being BOOL tags for status/configuration of my deices, relates to a feature request thread I created: http://inductiveautomation.com/forum/viewtopic.php?f=71&t=12380

How can I eliminate my comm issues? Am I already taxing the abilities of the OPC-UA driver? This is my first Ignition project, so I’m hoping to caulk this up to inexperience and can be pointed in a better direction. I know I can add more scan classes to be “slower” in nature but I can’t imagine 11,000 tags @ 1 second is that taxing, but what about 58,000 @ 5 seconds??

Thanks!

Kevin.Herron · June 30, 2014, 3:26pm

Can you grab a screenshot of diagnostics for that device? (There’s a diagnostics link next to its entry in the device list)

Also - what’s your ‘concurrent requests’ setting at for that device?

Curlyandshemp · June 30, 2014, 3:39pm

I have had similar issues with large # of tags in complex UDTs.
What I found to help is increase the fast scan class rate to 1250 or 1500ms,
increase the # of concurrent request to 4 or 6
and turn OFF auto refresh in the Clgx driver

Paullys50 · June 30, 2014, 5:04pm

I’ve attached the diagnostics screen shot, and I’ve increased the “concurrent” connections from 8 - 12 and will monitor.

Curlyandshemp - Your comments have me a bit concerned as the majority of these tags are fairly in-depth UDT’s. I would prefer to keep things at a 1 second or less scan-time otherwise device actions seem to lag. Sounds like I may have to create various scan-classes depending on the actions I need.

Kevin.Herron · June 30, 2014, 7:57pm

Well things look just fine right now. How often do you end up with comm issues? Do they coincide with screen changes or anything that would cause tags to move around from one rate to another in the leased scan class?

Paullys50 · June 30, 2014, 8:51pm

Kevin -

I haven’t found a trigger yet, often times I’ll be simply in the designer doing some edits and the tags drop out for a some time, 5 - 30 seconds then comeback. I have another college whom is starting to make edits too, I’ll ask him if/when he notices the same thing.

As for the diagnostics, is there something you are looking for in particular that I should watch for?

Kevin.Herron · June 30, 2014, 9:11pm

I was just looking to make sure the req/s for each rate group was what it should be - and it is.

I think online edits can cause intermittent issues… the driver has to re-browse when it detects changes, which sometimes slows down the other comms enough. (unless you disable auto-refresh like curlyandshemp mentioned)

shawnd · July 1, 2014, 3:29pm

I was just coming here to post a similar question when I saw this. I too would like more information about scan classes, and best practices. Some General questions I have are:

Does the organization of data types and scan classes have an effect on performance?
Should I have all tags in a data type be in the same scan class?
Should all tags in a window be in the same scan class?
Some of my windows reference multiple data types from the OPC server, if these OPC data types are reference in multiple windows, is it better to have the OPC tag referenced in multiple different data types (one data type for each window) so the scan classes can be set per window, or have one reference to the OPC tag in a data type, and use it in multiple windows?

My current setup has 3-4 different large data types from the PLC that are reference on multiple screens over multiple projects. These data types have station specific information for around 50 possible stations, though usually only around 20-30 stations are being used, so many of the tags would not need to be scanned unless used. I also have setup screens that access these same data types to display information for multiple stations. The tags on these setup screens might not be accessed only 3-4 times an hour for a couple min each time. All of the tags in my gateway were on the default direct scan class of 250ms, but after adding in another large complex data type, some of the tags on my screen would flash bad quality for half a second or so. I set all of the tags for the new complex data type to a leased scan class with a slow rate of 5000 and a fast rate of 1000 and things would be fine, except when I accessed that page, nearly all of the OPC tags in my gateway would go red, and then refresh after 3-4 seconds. I then split all of the tags in the setup screens into their own data types and tag folders, and set leased scan classes for them all individually, some of them are duplicates of the tags for the HMI station screens. The scan class times vary but most are around 3000 for a slow time and 750 for a fast time. When I switch between my setup pages the tags will flash red for a second or two, then go back to normal, then flash red for another 3 seconds before returning to normal state and being fine.

When a scan class switches to a fast rate do the tags become instantly bad until all of them are able to scan? It seems that the pages with more tags in the scan class have more issues updating.

I am sure this is a problem with how I have things organized, and I can probably fix it by switching back to direct scan, and possibly lowering the scan rate, but what I would really like to do, is reorganize things and only scan what I need when I need it so that I don’t waste resources.

I would appreciate any and all feed back.
Thanks!
-Shawn

Paullys50 · July 1, 2014, 11:07pm

Kevin -

I’ve been reviewing the system console, and it’s flooded with a lot of communication errors. I’ve tried to attach the log file but it fails for some reason, but this is what I am seeing:

 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_PID.CM_PID[5].Data.PV'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[56].Configuration.10'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_AnaIn.CM_AnaIn[41].Bad_Xmtr'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[40].Status.26'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[114].Interlocks.19'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[10].Configuration.18'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_AnaIn.CM_AnaIn[28].Configuration.3'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[118].Interlocks.1'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[93].Configuration.16'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA]Global.CM_Valve.CM_Valve[64].Status.19'. 
 6:03:30 PM SubscriptionManager Server returned StatusCode[Severity=Bad, Subcode=Bad_MonitoredItemIdInvalid] for request to remove item on '[PLC1_AoA

All of these are valid PLC tags, I have to believe that this is contributing to the random time-outs of tags. It does seem that my boolean tags I have created to look at different bits within a DINT PLC tag are the most problematic.

Kevin.Herron · July 1, 2014, 11:13pm

It’s hard to tell without more context, but those messages may not be as bad as they look.

Why don’t you export the logs from the Console web page and then email them to support.

jay · July 2, 2014, 1:11am

I started a similar thread a few weeks ago.

https://inductiveautomation.com/forum/viewtopic.php?f=70&t=12321&p=44757&hilit=scan+class&sid=06a98937160d8508148c1f967c44adf4#p44757

Your problem is that you have a lot of tags, enough to stretch the available bandwidth on the software and possibly on the communications to its limit. My problem is not tags. I have very few tags, but a limited communication bandwidth. My Ignition gateway is talking to multi-dropped PLCs on a 450MHz radio at 9600 bps, in the case that I am interested. We could have about the same problem/question, though in my case I’m sure that this a communications problem, not a software one. I am certainly not running Ignition or the OPC server to its limits.

I’ve been meaning to get back to this.

Kevin wrote, in the referenced thread

Both servers [either Kepware’s or Ignition’s] under normal conditions will then try to poll your Modbus gateway every 100ms. This is almost certainly too fast.

It is too fast, but there does not seem to any immediate penalty for doing so. If I create a scan class with a 10000 ms rate, Ignition will tell the OPC server to poll once every 10 seconds. As I lower that 10000 down to 5000, 1000, … 100 the polling gets faster and faster, but not any faster than the communication media can handle, obviously. It’s the dog that didn’t bark. No bad qualities are asserted, the Java VM memory does not fill up with a queue of unsatisfied polling requests. Everything works just fine, until once a month or so, when it doesn’t. Restarting the Gateway solves the problem.

So I do not grok the relationship between an OPC server and an Ignition Gateway scan class. I don’t seem to alone on this.

diat150 · July 2, 2014, 2:24am

what exactly dont you understand? keep in mind that kepware has quite a few settings that could override the ignition scan class setting. for instance, if you have kepware setup to request all data at 15000 ms, it wont matter if you have ignition set for 100ms scan class, because kepware will never attempt to scan any faster than every 15000 ms.

I think the common sense approach is to set your scan class to a reasonable number for the limitations of the equipment that you are using.

Paullys50 · July 2, 2014, 12:43pm

Well, I think I am starting to understand where my problem might be stemming from. In another thread I posted a problem I am having with the system.nav.goBack() function.

https://inductiveautomation.com/forum/viewtopic.php?f=70&t=12505&sid=ffe185ade0704df769cecec6274b5c37

In troubleshooting that, I’ve discovered that my windows aren’t actually closing when I switch to another main window using my tab strip navigation. I’m using 2 tab strips for navigation (Main Area, Sub Area), and I have added additional scripting so if a tab on the “Main Area” is pressed it defaults to a the first tab in the Sub Area tab strip.

I have a feeling this scripting is not properly invoking the “swap window” function. Thus windows are left open, and as more clients are running and more screens processed the screens remain open and all subsequent SQLTags remain “on scan”.

My Main navigation tab has a bunch of if statements:

[code]# Overview Screens

Main tab strip selection

if event.source.selectedTab == “Overview”:

Set Sub-Area tab strip to first tab

Property.getComponent('tabOverview').selectedTab = "Mimics/Mimic 001A Overview"

[/code]

I am directly changing the selected tab of the sub-area tab strip. I believe this is probably by-passing the “swap” functionality of the tab strip.

With that said, can someone confirm this? Is there a better method to script this functionaliy to ensure I am invoking the proper “swap window” event of the tab strip?

Paullys50 · July 2, 2014, 1:31pm

Some further troubleshooting…

I’ve eliminated the scripting as mentioned in the previous post, and the problem of windows not closing still persists. But perhaps I’m misunderstanding the “swap” function.

I have also just tested a single “sub-area” tab strip and opened the windows configured in the tab strip using the “swap window” function in the tab strip customizer, no scripting exists. And the problem persists, windows open but don’t close.

For reference, I have a button I can press that runs the following:

# Display Open Windows
windows = system.gui.getOpenedWindowNames()
system.gui.errorBox('There are %d windows open' % len(windows))

for path in windows:
    system.gui.errorBox(path)

It provides me a means of seeing which windows are open.

Paullys50 · July 2, 2014, 2:21pm

Ok, don't think it's a windows thing either...

Went through the logs and got this error

9:14:42 AM BasicExecutionEngine Task plc1_aoa requesthealthmonitor threw uncaught exception.
java.lang.NullPointerException
at com.inductiveautomation.xopc.driver.api.AbstractDriver$RequestHealthMonitor.run(AbstractDriver.java:1344)
at com.inductiveautomation.ignition.common.execution.impl.BasicExecutionEngine$TrackedTask.run(BasicExecutionEngine.java:573)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Think I'll give tech support a call.

Paullys50 · July 24, 2014, 11:55pm

Just a followup, seems that the problem was with the “leased” scan class. Switching to “direct” scan class has given me stability.

Currently at ~ 75,000 tags being scanned @ 500ms. I’ll probably slow it down to 1000ms.