I have upgraded to 7.5.10, and I keep getting the same error:
xopc.client.stack.UaClient.PublishRequestPump
StatusCode[Severity=Bad, Subcode=Bad_InternalError]: ServiceFault: StatusCode[Severity=Bad, Subcode=Bad_InternalError]
at com.inductiveautomation.xopc.client.stack.TCPClientChannel.validateResponseType(TCPClientChannel.java:833)
at com.inductiveautomation.xopc.client.stack.TCPClientChannel.receiveMessage(TCPClientChannel.java:794)
at com.inductiveautomation.xopc.common.stack.UAChannel$1DeliverMessage.deliver(UAChannel.java:967)
at com.inductiveautomation.xopc.common.stack.UAChannel$DeliverToDelegate.run(UAChannel.java:1592)
at com.inductiveautomation.xopc.client.stack.SerialExecutionQueue$RunnableExecutor.execute(SerialExecutionQueue.java:84)
at com.inductiveautomation.xopc.client.stack.SerialExecutionQueue$RunnableExecutor.execute(SerialExecutionQueue.java:81)
at com.inductiveautomation.xopc.client.stack.SerialExecutionQueue$PollAndExecute.run(SerialExecutionQueue.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
Can you attach or send to support the logs.bin.gz file exported from the Console area of the gateway? There’s not enough context to figure out what’s going on here from the stack trace alone.
Oooph. You’ve got a bunch of pretty bad things going on.
It looks like you’ve got a number of remote UA connections, some of which are not responding to read requests for the server’s current timestamp (this is just used as a sanity check, it should be basically instantaneous…) in time, which causes the connection to be reset.
Additionally, some of your remote servers are not responding with keep-alive or data publish responses within the timeout established upon connection either, which is leading to timeouts on the publish requests also.
Right now I’m not really sure how to address any of this. The connection(s) must be of really poor quality for this to be occurring, as I’ve seen plenty of remote UA connections before but have not seen anything like this happening.
As for the quality of the OPC connections, all our sites are on T1 or fiber networks, and they are running the KEPServerEX program. Tags seem to be all showing good quality after reinitializing the OPC servers at the sites.
Maybe the Kepware servers at the other end of these problematic connections are overloaded or something. Do you know what Kepware version they’re all running?
I can try giving you a custom build that has exaggerated values for the read request and publishing timeouts to see if it alleviates the problem, but it’s not a viable longterm fix. If the connection is good there’s really no excuse for the server to fail to respond within the current timeouts as they’re already pretty generous.
[quote=“dfenter”]Is there a way you can disable that you can disable the function that makes the OPC server disconnect if the timestamp check fails.
Specifically this function: Reading server time and state timed out, disconnecting…
I really don’t care about the timestamp check, and I don’t see why it would cause the server to disconnect…
Every new release it seems like there is a new error that deals with the timestamp check. Just disable it.[/quote]
It has nothing to do with the value returned, it’s just picking two nodes that the OPC-UA spec says will be there in every server and reading them. At the request of you and others, there is no longer any logic that validates the sanity of timestamps on received values.
The problem isn’t that the time returned is wrong. It’s that the act of reading two values that should just be readily available in memory timed out. Which means nothing else is probably working (well, timely, or at all) either. It’s also verifying that the current session is valid, and more importantly, keeping the session alive in cases where people have very slow, or worse, no subscriptions (but still periodically read and write to the server).
I understand that it’s frustrating when things break. It sucks that you have to deal with that. But you’ve built a complicated system with a lot of moving pieces and if you change or upgrade a major component there will be fallout. 7.5.10 is by far the most stable release in the 7.5.x series so downgrading is not recommended. Plus you’ll just go back to the old problem with timestamps being off by too much, which is why you upgraded in the first place. Finding and fixing the root cause of the current issue is the best path forward.
Pretending it doesn’t exist by simply removing this built-in sanity check would only lead to other problems caused by whatever the underlying problem with these remote servers is.
When I get into the office today I’ll build you a custom UA module that has ~2x the allowed timeout for read sanity check as well as an increased timeout on the publish requests. My hope is that these servers are periodically just extremely overloaded or something and that the increased timeouts will allow whatever is causing that to pass. If this doesn’t work we’ll go from there…
What UA servers are running at these remote locations that are having problems. Are they part of the Siemens PCS7 system you mentioned, are they Kepware, or are they remote Ignition OPC-UA servers? It also be nice to get an idea of how many tags you’re subscribed to from each of the servers and what the rates of those subscriptions are.
The only time I’ve seen something similar to this was with another customer using Kepware, although the connections weren’t remote. They were periodically doing large batches of writes to tags in the server which for whatever reason would overload the server/driver on occasion. It turns out they were running a pretty old version of Kepware, something like 5.5 or 5.6, and after upgrading to the current 5.12 version the problems disappeared.
The remote location we are having the most trouble with is pulling 500 tags at 2000ms, using KEPserverEX 5.3 with the OPC DA license. I am in the process of upgrading the Kepserver program to see if that resolves the issue.
I have other locations that are pulling 7000 tags without any issues, but they are using KEPserver 5.10.