I am having a problem with the lack of physical diversity in my backup systems. I am currently running a Stratus Fault Tolerant server as a virtualization host, which turns out is a problem with updating the virtualization host or software.
What I want to do, is run the Backup Ignition Gateway on different host at a high bandwidth connected remote location. When I discussed this with the initial Integrator and Inductive Automation, they both warned me against it, but they couldn't explicitly tell me why it wouldn't work.
I recently talked to another Ignition customer, who had a similar problem and said, yes, we run the backups gateways remotely, it wasn't a big deal at all.
Does anyone else have experience with this?
I would think your success would largely depend on whether you have any significant device driver polling traffic (Modbus, AB Logix, Siemens native, Mitsubishi, etc), as that stuff is really sensitive to latency. If the hardware device traffic is OPC UA subscriptions, MQTT, or any other "report-by-exception" technology, you would likely be fine.
I use Kepware to translate ModbusTCP to UDTs, etc.
Our official recommendation is that the backup gateway should be local to the master gateway.
Put another way:
Our current failover architecture is very much geared towards "there's a gateway running all the operations in factory X". So, say a hardware fault happens, or something goes wrong with one of your servers - you're fine, you failed over to the backup that's in a different server cabinet or whatever.
But if something happens to the entire building, like the power goes out, or it catches fire, or whatever, there's no point in "failing over" to something else in the same cabinet, because you're also not going to be running HMIs out of that same facility.
What you're really after with 'offsite' redundancy is probably more like a 'high availability' setup, which also implies (or at least really strongly encourages) that you're not doing direct device comms, to Phil's point, and you're okay with facility B, C, and D continuing to run while A is on fire.
There are existing, off the shelf ways to do HA for Perspective sessions using load balancers. Vision is more complicated, but still doable. It also implies separating direct comm load with devices, if you have any, onto dedicated tag/IO gateways, distinct from your visualization/HMI gateways.
But why, I don't see this issue as any different than the cloud based solutions that I have recently seen Inductive Automation support. Why is the public cloud any different than my private cloud? Yes, I am running Perspective.
I don't think about this in terms of actual failures, if I am in a real failure everything is down anyway. I focus on this in eliminating downtime for security updates which are becoming more and more frequent. My entire issue started with a urgent mandatory security update.
Inductive's cloud offering has no drivers.
Two gateways running in the same region/datacenter on the same public cloud are "local" enough to each other. Best you can do.
Private cloud maybe, but not "local" to the master.
The key delination here is that I want to run on a different subnet.
A point of reference, I ran 5 plants on a different SCADA system, and I ran them all that way, one server was local and the other was remote, all running ModbusTCP and never had a problem. This was the first pilot project conversion of one of the plants and I just don't want to fall back on uptime flexibility as part of this architecture.
Where was Kepware in this picture? If Kepware is local to the hardware for both local and remote Ignition, then the traffic is OPC UA and likely fine. (But Kepware may be your single point of failure.) If Kepware is also remote in the remote case, then you have latency-sensitive Modbus going over your WAN. You would have to look at your Kepware instances to see how much leeway you have with the increased latency.
Also keep in mind that if only your remote comms go down, then your master and your remote will both try to be "active", which then sets up a number of ugly "split brain" scenarios.
Do whatever you want. I'm telling you what our recommendation is. You're free to ignore it.
When your backup gateway isn't local to the master, you increase complexity, increase the chance of sync failures or split brain, and overall decrease the reliability. The redundancy system is a bit old and wasn't designed with a remote backup in mind.
1 Like
I'm not disagreeing at all, I'm just trying to understand - doesn't tag splitter and store and forward solve sync & split brain issues? Probably a dumb question but I'm interested in learning more.