OPC-DCOM Stability Problems

cvaldi · October 24, 2012, 1:55pm

After some time of running some clients Ignition stop working and in System Console log appears this error:
Failed to post the message to the thread. ThreadID = 1584, errorCode = 1816
com.jniwrapper.win32.LastErrorException: Cuota insuficiente para procesar este comando.

As OPC i’m using RSLinx Clasic Professional running on Windows Server Standard FE.

Kevin.Herron · October 24, 2012, 2:06pm

What version of Ignition and the COM module are you using?

Colby.Clegg · October 24, 2012, 2:55pm

Hi,

This situation is usually caused by a blockage somewhere else in the system. As Kevin mentioned, the version can plan an important role, as the COM module has changed quite a bit over time. However, please also do the following: when the system says that, go to System>Console>Threads in the gateway, and see if it reports any blocked threads. If so, at the bottom of the page, use the “Download Thread Dump” to get the thread report, and post it here or email it to support. This can help us see what might be causing this.

Regards,

cvaldi · October 24, 2012, 6:47pm

I´m using Ignition 7.5.1 (b1122) and OPC COM 1.5.0 (b139).
I’m posting the file with the thread report and the System Console Log.
logs 2012-10-24.bin.gz (208 KB)
thread-dump 2012-10-24.txt (118 KB)

cvaldi · October 25, 2012, 5:18pm

Any news? Do you need i send more information?

Colby.Clegg · October 25, 2012, 9:10pm

Hi,

The thread dump shows several issues, but unfortunately it’s not quite enough to say exactly what has happened. Here are a few details:

There is a block in the transaction group status monitoring that was fixed in 7.5.3.
The OPC system is blocked trying to set the active state of a group - it’s hard to say whether this is a real problem, or just what it was doing when you took the thread dump
Other opc tasks are blocked waiting on #2, such as a read operation

I would recommend upgrading to the latest version of Ignition (7.5.3 or the 7.5.4 beta), because that should take care of #1. In regards to the rest, what is the version number of the RSLinx you are connecting to? We’ve had some similar isssues to FactoryTalk Gateway lately, but haven’t been able to really narrow it down to a repeatable test.

Regards,

cvaldi · October 25, 2012, 9:55pm

They kept bloked several minutes after i take the thread dump.

The version of RSLinx is 2.57.00.14 and i will upgrade to Ignition 7.5.3.

One thing i have discovered is that if i restart RSLinx all those blocked threads gets unblocked and when connection is restablished Ignition start to work ok... for a while. Do you think that could be a problem in RSLinx configuration?

Colby.Clegg · October 25, 2012, 10:57pm

Hi,

I doubt it’s really a problem with the configuration, but instead either something internal to RSLinx, or in the interaction between Ignition and RSLinx. Unfortunately things like this can be very difficult to troubleshoot, but if we can narrow it down, it might be possible to figure out a way around it.

The “blocked” threads are trying to do something against RSLinx, and are blocked waiting for the call to return. RSLinx may in turn be waiting for Ignition to do something, such as process data values for a subscription, which results in a deadlock- it’s waiting for us, and we’re waiting for it. This particular example isn’t exactly evident in the thread dump, but is possible.

How many tags are in the system? How many transaction groups? From the fact that the trace showed a read operation, I suspect you’re using “read mode” on some groups… how many are set up like this? You might go to the configuration settings for your connection to RSLinx and disable “Use Async Operations” and see if that makes a difference.

Regards,

cvaldi · October 26, 2012, 1:36pm

In the system are 3580 tags and 124 Transaction Groups. I don’t understand what exactly you mean with “read mode” but most of transactions are in “OPC to DB” update mode.

I had upgraded to Ignition 7.5.3 and will let you know how does it work.

Regards.

cvaldi · October 26, 2012, 2:15pm

After the upgrade to Ignition 7.5.3 the Ignition Gateway could not initialize until i restarted RSLinx, so the problem is still there.

Colby.Clegg · October 26, 2012, 5:21pm

The gateway couldn’t initialize? So it stayed at “starting” until you restarted RSLinx? Does it do this each time you restart the gateway?

If so, can you generate a thread dump from the “Gateway Control Utility” while it’s in this state? I would like to see what it’s blocked on. Also, how many transaction groups do you have? How many SQLTags, and scan classes?

I currently suspect that the way that subscriptions are managed for transaction groups is causing an issue with linx. If the thread dump supports this, we can try adjusting things there and see if it helps your system. I’m also trying to track down that version of rslinx. What kind/how many devices are you connecting to? How many tags are you using overall?

Regards,

cvaldi · October 26, 2012, 6:51pm

Yes, it stayed at "starting" until i restarted RSLinx. I have tested it two more times with the same result. I have waited for more than 10 minutes before restarting RSLinx and when i restarted it Ignition Gateway state changed to "started" almost inmediatly.

[quote="Colby.Clegg"]
If so, can you generate a thread dump from the "Gateway Control Utility" while it's in this state? I would like to see what it's blocked on. Also, how many transaction groups do you have? How many SQLTags, and scan classes?[/quote]
I'm posting 2 thread dumps from the moment Ignition Gateway is in "starting" state.

In the system are 3580 SQLTags, 14 scan classes and 128 Transaction Groups. All Transaction groups are set in Subscribe OPC data mode.

[quote="Colby.Clegg"]
I currently suspect that the way that subscriptions are managed for transaction groups is causing an issue with linx. If the thread dump supports this, we can try adjusting things there and see if it helps your system. I'm also trying to track down that version of rslinx. What kind/how many devices are you connecting to? How many tags are you using overall?[/quote]
I'm connecting to 1 Logix5573 processor and 2 databases.
thread dump ignition (starting) 2.txt (32 KB)
thread dump ignition (starting).txt (32 KB)

Colby.Clegg · October 26, 2012, 10:08pm

Hi,

Thanks for those. They show the service blocking on startup of the groups, and in particular the groups blocking on RSLinx. The interesting part is that it’s blocked in a slightly different location each time, once on adding items to the subscription, and once when simply pausing the subscription, which it does right before it adds the items.

Just out of curiosity, if you go into RSLinx>DDE/OPC>Communication Events, are there many errors? In looking around, I’ve found a number of reports like this one that show that an overflow in error messages can cause OPC problems with Linx. Now, it seems like it’s happening a bit too quickly in your case to be this, but who knows.

Just out of curiosity, have you tried connecting directly through our AB driver? I know it might be a pain to change all of the item paths (though with the group XML export, you could probably search and replace, I think the core address structure is very similar), but even if we got through this particular problem, you may just hit a different problem in the future. RSLinx is notoriously bad as an opc server.

Regards,

cvaldi · October 29, 2012, 1:05pm

I have checked while ignition gateway is starting and at the beginning there are no errors but they start to appear after a while. After about 10 minutes there are around 100 errors.

Yes, i've tried. I have changed all transaction groups to your OPC UA and started to receive a lot of timeout messages in the log. Rigth now i'm working with some transactions with your OPC and the rest with RSLinx. This configuration keeps the system working longer time but eventually it fails anyway. Tell me if you would like that i send you the log with timeouts so i change the configuration again and get it.

cvaldi · October 29, 2012, 1:28pm

Here is the log where timeouts appears.
logs 2012-10-29.bin.gz (271 KB)

cvaldi · October 30, 2012, 5:45pm

With all transaction groups moved to Ignition OPC UA the system has not failed in 2 days, but i get a lot of timeout messages in Console Log.
Could you tell me if these timeouts could affect the performance of the system?

Colby.Clegg · October 30, 2012, 8:17pm

Hi,

Try checking the “Diagnostics” link next to the Device listing in the gateway config section. What do the numbers look like? Posting a screen shot or copy/paste of that could help us see what’s going on.

Regards,

cvaldi · October 30, 2012, 9:00pm

Hi,
I’m attaching the information you asked for.

Today my system started to be unstable again, from time to time all tags quality changes to “Bad quality” and after a while they recover. Maybe i should move all tags to Ignition OPC UA but i don’t know if it would work ok because of the timeouts problem i have mentioned.

Regards.

thechtman · October 30, 2012, 10:26pm

I’m not current with your system, but if it is ControlLogix or CompactLogix then we have an updated Allen-Bradley drivers module that is currently going through testing that may solve the timeout problems. It should be available in the next day or so. I’ll post the version number on this thread after it is available on the beta download page.

The other possible cause is the communication load on the PLC. Are there other systems/PLC communicating to the PLC you are having timeout problems with?

cvaldi · October 31, 2012, 12:17pm

OK, thanks, i’ll be waiting for the update, but is it a beta version? I don´t know if we are allowed to install beta versions of software in ours systems, so maybe it only will be usefull for testing purposes.

About the communication load in the PLC, it is at about 10%-15% of its capacity and we have no problems with other devices communicating with it.