Why so little consensus on UNS/OPC-UA debates?

Hello everyone. I’ve been reading through OPC-UA / UNS discussions for several years now, and I keep seeing strong arguments on all sides, but no real convergence on what is optimal. I can only assume this is confusing for end users, too.

I’ve started to consider that the reason for the disagrements have less to do with tribalism, sunk costs and other factors than starting from different assumptions about what is primary, what counts as explanation, and where authority should sit. I notice that these debates rarely make those assumptions explicit.

So the same debates repeat (UNS vs OPC UA, broker vs model, edge vs cloud), but don’t really resolve.

I’m curious to see if others see it the same way, or if I’m missing something.

I tried to think this through more carefully and make those starting assumptions explicit here. This is more aimed at IIoT architects, automation/controls/SCADA/MES engineers. Am I on/off target? Thanks in advance for any feedback.
https://medium.com/p/dbc4feb193e1

Honestly -- and I know this might not be a satisfactory answer -- I think it just depends. Like OPC-UA is great if everyone can reach consensus on the information model before deployment, whereas UNS is good where there is on-the-fly discovery or development. There is also the question of, what is the source of absolute truth for your data? Your information model or the individual messages published to your UNS broker -- and where does ultimate authority lie in deciding what is true? Do you believe your model precedes your data or emerges from it? As for the data streams, are they coming from a mixed bag of historians, devices, cloud platforms, etc. ? And there is also the question of how the data flows -- its latency, if it is asynchronous, closed-loop, deterministic, whether deadbanding is involved... I like the idea of OPC-UA at the machine/cell level and then using a UNS at the enterprise level to bridge things together; I've worked on a setup like that and it wasn't too bad. Both methods take a lot of discipline to make sure models/namespaces stay consistent and don't blow up into something unmanageable. Every situation is unique and a lot of case-specific questions need to be answered by an experienced team of engineers before the right path forward can emerge.

Comparing UNS to OPC-UA is like comparing apples to oranges. They're related, but not competing technologies.

UNS is a structure/architecture that can be thought of as a way to structure your data in a unified manner that allows for a large monolithic architecture where everything is organized well. UNS is not a protocol.

OPC-UA is a communication protocol used to serve data to clients/subscribers. There's nothing saying that you can't use OPC-UA to represent/serve the UNS data to clients.

A closer comparison would be between OPC-UA and MQTT. In this case, they're both higher level communication protocols that allow clients to subscribe to data points. They're both going to be lower bandwidth than using polling, but every situation needs to evaluation the use case to determine what will work best. Many times, both are used together to get the best solution. In Ignition, the OPC-UA server has drivers that poll end devices to collect data that can be used to populate values in the Ignition tags, which can then be pushed with the CirrusLink MQTT Transmission module to a central broker. So you have the OPC-UA server pulling in data and MQTT pushing it to the broker for any other clients to subscribe to.

Thank you for this feedback. Yes, I agree they’re not competing technologies, and in practice they’re often used together. The reason I grouped them is that they tend to show up in the same architectural discussions and decisions. I'm actually trying to get to a level below that. The question I’m really asking - and I understand the article is very long, and maybe needs a shorter version - is more about the assumptions behind things like which layer is allowed to define the real state and what counts as explanation. Or, for example, where decision authority lives under real conditions. Have you seen cases where similar stacks behaved very differently depending on how those were handled? I ask because even when teams combine OPC UA, MQTT, and a UNS-style structure correctly, similar issues seem to show up. For example, a broker/namespace becoming the de facto source of truth for a machine state. Or decisions moving into layers (dashboards/site apps) that weren't designed for time sensite authority - issues like that.

This.

UNS on OPC-UA is perfectly normal as a transport protocol, and is preferable to MQTT in certain applications.

In regard to the article posted, its a very long winded way of saying that a UNS lacks context and as such it doesn't matter how good your clean data is, if it has no bearing on the real world context, it's no better than dirty data in a lot of cases. OPC/MQTT does not bring context, it brings values. Dashboards can imply context via design decisions and grouping of data, but inherently are unable to show complex context. Context can be (and generally is) gained by experience of the operators/maintenance staff. The recent change is organisations trying to store context near the data instead of in silos, individual humans are also silos for context.

Thanks for getting back to me. Yes, it is long winded. I’ll take that on board and try my best to condense it into a video. Thank you for persisting with it. But saying this is a way of saying UNS lacks context is not what I'm saying in full. That's one dimension, but the others are that authority is often misplaced/implicit, causality is not established by distribution/structure, systems can be well-integrated and still operationally confusing, architectural decisions depend on hidden assumptions and context alone doesn't resolve decision rights or explanations. I'd just ask - if context mainly sits with people, what determines in practice which state is treated as authoritative and what gets accepted as the actual cause when different people interpret the same situation differently?

Go take a look at a legacy factory with no central electronic downtime/root cause system. Half the time the symptom is listed as the cause on any paperwork, the paperwork is never looked at and opinions get aired in the lunchroom and management don't want to waste the maintenance team's time further than that. Most of the time if a management request comes down the chain, the maintenance manager or production engineer will answer it with their understanding, send it up the chain and thats the end of the discussion. Authority in this case trumps right most times.

Simple example I have had:
Factory floor has downtime forms that the floor manager fills out every time there is a downtime event. Floor manager will fill the forms out in a way that always presents themself and their crew in the best light, its human nature. Floor manager doesn't bother filling out a form for a 30 second downtime, they think its a waste of time. All the downtimes are recorded as maintenance failures or automation issues.
We implemented a simple downtime event tracker, hardcoded the source of the events back to the individual process stops, e-stops and process alarm limits that caused the plant to stop. The floor manager could only add a reason and notes, they could not change the underlying source of the stop.
Attitudes changed very fast, management couldn't accept that an e-stop event that lasted for 10 minutes was an "electrical fault", suddenly context mattered a lot, reasons were critically analysed. Real value was gained when they realised a staff member was not allocated enough time for a station on the line, using the process stop button to slow the line more than 300 times a shift, but for less than 10 seconds each time.
Management saw this, asked questions and changed the task on the station. Suddenly they dropped 30+ minutes a shift of downtime.

You have to make reports on root data, and gather context to qualify it. You also have to learn how to make the process work for your business with the data you have available. Humans inject opinion into analysis. AI injects hallucination into analysis. Once you think you discover a root cause, use the scientific method to qualify your hypothesis. If you cannot do that, its is just that, unqualified hypothesis.

Thanks for sharing that example. That's really what I was after - an expert take like that. I'm just glad that it lines up so well with a lot of what I was trying to get at in the article. I see that the system didn’t make clear what was primary and what was allowed to count as the cause (which addresses my point that the system must define what is primary and not leave that implicit). The person filling out the downtime report was able to define the cause, even when it didn’t match the actual stop event on the machine (which is exactly the problem - authority was allowed to override what actually happened). And then the event tracker changed that by fixing the source event so the cause had to stay tied to what actually stopped the line, rather than being reassigned in reports (which is the right move because it constrains what is allowed to count as the cause instead of leaving it open to interpretation).

That’s what I’m getting at really. Even if you've got better data and more context, the system still has to make clear which state is binding and what's acceptable as an explanation. A system that relies on people applying the scientific method is still permissive because it allows multiple candidate causes to exist in the first place, and only resolves them afterward through analysis. So what actually enforces it in practice - system constraints like your event tracker or ongoing discipline? If it’s system constraints, then that's an architectural problem. If it’s ongoing discipline, then the system is still leaving the problem open.

Goes back to 'In God we trust, everyone else must bring data."

I dealt with the same management-by-Excel philosophy here for the first five-ish years I've worked here, until I brought in FSQL-- many moons ago.

Okay, there's still a lot of management-by-Excel here, but at least there's data.

And there you have hit on the cause of why something usually fails. You can for example:

  • Have a vision system that looks at at a wire harness connection.
  • The vision point fails because the rest of the harness is blocking the view of the camera.
  • You have photographic evidence that the build process was not followed.
  • The decision is made that the vision system is the issue, because we didnt turn on 'X-ray Mode'TM to look through the harness to the connector.
    • Legal disclaimer: This specific decision was for humorous illustration only. No manager, living or dead, is portrayed in this re-enactment.

Yes, exactly. The vision system in your example shows the process wasn’t followed, and the cause still gets assigned to the vision system. So what's actually primary there? Is it the physical event or whatever gets written down afterward? If the system can show one thing and accept another as the cause, then what's binding? Where does authority actually sit? That’s the point really. The system allows a cause that conflicts with the underlying event to stand, so causality and authority stay open and get decided after the fact. That’s the failure described in the article.

Authority comes from people. Data can only be binding if people with authority say so.

There's no magic wand to "fix" people problems. Data can expose people problems, which may help other people act to fix them. But data will never automatically overrule people with authority.

OPC, MQTT, and various branded protocols collect live data.

UNS and ad-hoc architectures organize data, and hopefully help engineers re-use code.

Dashboards and reports present data to people.

People.

Thanks for explaining. If people decide, then every time there are multiple interpretations of the same data, authority has to step in and pick one. Isn't that the system generating ambiguity and asking someone to resolve it every time? A system that never defines what counts as the cause in the first place? It seems to me that if the system leaves multiple interpretations open, then “people decide” just means someone selects one after the fact. That’s exactly the point in the article I linked above - when the architecture doesn’t fix what is primary, that means authority and causality stay open and get assigned instead of determined. That means the same event can be explained differently each time, and the system never becomes reliable or scalable.

Systems don't define themselves. People define systems. Ambiguity can be manufactured by motivated people to suit personal goals/desires. Within a completely unambiguous system.

:man_shrugging:

I'm going to go out on a limb here to say you have unrealistic expectations.

I think people define the architecture, and the architecture should be tailored to your needs. If your data is ambiguous and is up for interpretation, you might need a better architecture that eliminates the ambiguity. That is why 'it depends' is so often the right answer, because every environment is different, and choosing how to build your architecture is so essential. That said, I also think UNS is often just a way of saying 'We want to have a data-driven and event-driven architecture', but I do think there is no one-size fits all approach for defining architecture. It depends on:

  • Sector
  • Company structure
  • The people/skills in the company
  • The legacy that brought a company to where they are

And probably more factors.

Maybe you're right. I'm not sure. But if expecting a system to fix what is primary and what counts as cause is unrealistic, then is it fair to say that ambiguity is part of the architecture by design? That means some events will carry multiple plausible explanations, and authority will select one after the fact. That works at small scale. Plants already operate like that. The question is what happens as you scale it. As the number of assets, sites, and events increases, the number of ambiguous cases increases along with that. Each one requires interpretation before action, and those interpretations vary by context, incentives, and timing. So how does a plant with 20,000 sites keep that consistent? How does one with even 10 sites? At that point, consistency of explanation becomes a limiting factor, doesn't it? Root cause starts to drift and comparisons lose meaning. Improvements become harder to trust because the underlying cause is not stable. That lines up exactly with what I see a lot of plants already reportig, i.e. systems that look good in pilots and then fail to work when attemting to scale up. I don't know. But what is actually scaling here - data movement or decision quality? If this expectation is unrealistic, then I suppose the implication should be stated clearly: these architectures can move and organize data effectively, but they do not by themselves establish causality, fix authority, or guarantee consistent interpretation at scale. I just haven't seen that in the Industry 4.0 marketing materials I've read.

Thank you for this. Yes, a couple of other people have responded with the “it depends” point again. What that misses, though - and why it doesn't answer my question really - is that any system that drives action still has to establish what actually happened and what should be acted on. That requirement remains regardless of the environment. Different situations change how a system is built, but they don’t remove the need to make that determination. Every system ends up doing it somewhere, either in the structure of the system or later through interpretation. When it’s left to interpretation, the same event can be resolved differently depending on who looks at it and when, and that shows up in how problems are explained and how actions get taken. Systems can still run like that, but it puts a ceiling on how cleanly they scale. “It depends” comes after this, not instead of it.

This is unrealistic. Systems are what typically need fixing, and other systems can be tools for people to do so.

Scaling a helpful system depends on people in authority supporting the scaling and decision advice produced. Systems cannot "fix" anything without authority granted by people.

{My Bold.} This may be the source of your unrealistic expectations.

By marketing materials, sorry, I mean LinkedIn posts by engineers and IIoT architects, automation/controls/SCADA/MES engineers. Today is the first time I have seen any expert confirm that IIoT architectures can move and organize data effectively, but they do not by themselves establish causality, fix authority, or guarantee consistent interpretation at scale.

To just clarify, I am not saying that a system fixes reality without people. To suggest otherwise would be bizarre. The question is what the system makes explicit before people act. If the system leaves what happened open to multiple interpretations, then every decision starts from disagreement about the event itself. Authority then has to choose a version of it before anything else can happen. That works when volume is low. As volume grows, that step repeats more often, different people resolve it differently, and the explanation of the same event starts to vary. At that point the system is no longer providing a stable basis for action, it's just feeding interpretations into decision-making. So the point is whether the architecture removes that ambiguity up front or pushes it into every decision. If it’s the latter, then the limitation is structural, and it shows up as inconsistency and difficulty scaling long before anything like AI or “Industry 4.0” delivers value.

Have you ever studied Law?
You are making discussion points that suggest there is a working solution that exists in another industry. Law is a classic example. Very black and white rules, but 1,000,000 ways to apply them to specific situations. How do we solve it? We assign authority to people, and have them adjudicate the situations.

You can't build a system that makes judgments on arbitrary situations that you have not defined.
That is called AI hallucination.