Gateway Network Load Balancing

Leor_Fishman · April 9, 2025, 3:52pm

Gateway Network question: if I have 2 gateways, A and B, that are cloned from a gateway backup and have the same name, UUID, and cert keystore, if I put them behind a load balancer (such that gateway A receives requests on port 8060 if it's up, and B receives the requests otherwise), will the gateway Network cope with this for inbound connections?

Assume that A and B can pull information from the same lower level PLCs and communicate with the same databases

Leor_Fishman · April 9, 2025, 6:59pm

@PGriffith any idea here? I tried to derive this from docs but the info on the gateway network session state is pretty sparse

pturmel · April 9, 2025, 7:24pm

Pinging IA employees is frowned upon, and pinging most others is fruitless. (I, among others, read everything.)

It has only been four hours. Bumping a topic that quickly is also frowned upon. Four days is more reasonable.

If you don't get an answer, odds are there isn't one, and you need to do the hard work yourself.

Leor_Fishman · April 9, 2025, 7:36pm

Sorry about that!

PGriffith · April 9, 2025, 10:10pm

Why not just have a redundant pair, which is guaranteed to be supported by the software?
The gateway network is fully aware of redundancy.

That aside, I can pretty much guarantee this won't work as described, "seamlessly", if it works at all. GAN connections rely on websockets under the hood - you won't be able to transparently swap out websockets without something at least temporarily noticing.

Even if it does work, I would advise against it. You're completely on your own in uncharted territory with no backup from our QA or support departments if something does end up weird, and this seems to invite a split brain scenario. Ignition gateways are tragically stateful, and even the first party redundancy sync has tricky issues with state management - running two servers independently, no matter how closely synchronized at the outset, is absolutely going to lead to drift and weird behavior.

What is the problem you're actually trying to solve?

Leor_Fishman · April 9, 2025, 10:33pm

The problem I'm actually trying to solve is 'I've spun up a clone of my primary gateway that lags behind the primary for X amount of time for both OS updates and operator changes, such that e.g. a bad config change doesn't take down my gateway and nor does an OS patch' -- the latter is covered by redundancy, the former isn't. The actual 'lagging clone' state is stuff that I/my platform already has/manages and have tested with a bunch of different pieces of state, except specifically with this gateway network question.

PGriffith · April 9, 2025, 10:58pm

You might get useful advice (or a polite "hell no") if you contact our sales engineering department -they've got more practical deployment experience than I do.

From my end, ^[1] I'd still go with "don't bother". Design processes to accept small windows of downtime and you'll be happier when acts of god inevitably occur despite all the technology in the world. If FAANG or w/e can't get five nines, you're not going to. What testing/rollback procedure are you going to use if your IT department decides to mandate Crowdstrike on every VM in the facility, for instance?

Or, reorganize things such that the truly important can't ever fail stuff...isn't being handled by Ignition at all.

disclaimer: I'm just a software dev who never has to deal with phone calls at night ↩︎

michael.flagler · April 9, 2025, 11:53pm

If you need to test OS changes, etc, spin up a dev/test server in an isolated environment and you could even run it in trial mode. Operator changes you're not going to be able to lag behind as those are live values for operations of the facility/equipment. Configuration changes of Igition on the other hand can be lagged behind or tested in this dev/test environment without affecting the production system, then push to the redundant production server when you're ready.

Leor_Fishman · April 10, 2025, 12:26am

The platform we're building basically handles the entire deployment on this side (including not lagging operator changes to the PLCs/tags, but lagging the operator changes to e.g. tag configs etc) as well as allowing that secondary to be promoted to a primary

michael.flagler · April 10, 2025, 12:42am

I think my concern here would be if you're violating license agreements with Ignition unless you're licensing both gateways independently or using a leased license instead of a perpetual license.

Leor_Fishman · April 10, 2025, 12:45am

Independent license, don't worry