Gateway Redundancy

Has anyone an idea what this eerror message means.

Cannot invoke "com.inductiveautomation.metro.api.ServerId.toEncodedString()" because "localServerAddress" is null

Which localSeverAdress is meaned here?

This message comes aftrer approving the incoming connection from the backup node.

We have a ticket in progress to fix this. The workaround in the meantime is to reset the backup's gateway network connection on the master gateway, and the connection should go back to normal.

3 Likes

I’m facing another problem, and I’m not sure what else I could try.

Here’s my current setup:

  • 2 Gateways, both running as Docker containers

  • A Traefik reverse proxy in front of them

I’ve already tried many different configurations:

  1. Clients via HTTPS with SSL offloading on Traefik → Gateway port 8088
    Gateways connect via SSL on port 8060

  2. Clients via HTTPS with SSL offloading on Traefik → Gateway port 8043
    Gateways use a self-signed certificate, connecting via SSL on port 8060

  3. Clients via HTTPS with SSL offloading on Traefik → Gateway port 8088
    Gateways connect via port 8088 without SSL

  4. Clients via HTTPS with SSL offloading on Traefik → Gateway port 8043
    Gateways use a self-signed certificate, connecting via port 8088 without SSL

  5. Clients connect directly to Gateway port 8043
    Gateways use a self-signed certificate, connecting via port 8088 without SSL

  6. Clients connect directly to Gateway port 8043
    Gateways use a self-signed certificate, connecting via SSL on port 8060

  7. Clients connect directly to Gateway port 8088
    Gateways connect via port 8088 without SSL

  8. Clients connect directly to Gateway port 8088
    Gateways connect via SSL on port 8060

In almost all of these scenarios, redundancy works as expected. The manual switchover by pressing the button on the gateway also works correctly.

However, the automatic switchover in the Perspective session only works if both clients and gateways connect directly to port 8088.

I have no idea what could be causing this issue. Manual switchover, synchronization, and even the automatic failback (when the master comes back online) all work perfectly. It’s just the automatic switchover in the Perspective session that fails in every other configuration.

It looks like there is a separate ticket in progress to fix redundancy failover for Perspective sessions. From the description on the ticket, it sounds similar to the situation that you have described.

I saw, that in the nightly the bug should be fixed.

Now I have tried. Wothout the reverse Proxy it works for me. With the Reverse Proxy dont.

I have an idea what could be the problem.

The difference between with and without reverse proxy is, that with reverse proxy the Hello Request returns HTTP-Error 402 and without reverse proxy the hello returns Timeout.

So my guess is, that the session only forward to the backup, if the hello request returns Timeout. Could it be like this?

            onHelloRejected(e) {
              var t, o;
              const n =
                null === (t = null == e ? void 0 : e.response) || void 0 === t
                  ? void 0
                  : t.status;
              (x.error(
                () =>
                  `Hello API call failed.  Code=${null == e ? void 0 : e.code}.  Status=${null == e ? void 0 : e.status}.  Message=${null == e ? void 0 : e.message}`,
              ),
                n
                  ? 404 === n
                    ? this.transition(O.ClientActions.NO_PROJECT)
                    : (null === (o = this.idle) || void 0 === o
                        ? void 0
                        : o.maybeTriggerIdleTimeoutAction()) ||
                      this.scheduleNextHelloCheck()
                  : this.maybeRedirectToPeer());
            }

If I understand the client script in the perspective session correct, than the client only redirect, if the master dont send any status code. But if the gateway is behind a traefik reverse proxy then this dont work.

The scenario is, that the gateway docker stops working but the traefik reverse proxy still works. Then the reverse proxy answers with 502 or 404, depends on the configuration.

Maybe this should be changed, also to switch if the statuscode is something like 502?

I don’t have a good answer on how to deal with the reverse proxy in this situation, but I am asking around on our side to see if anyone else has any ideas.

I collected some additional informations.

When I trigger the redundancy switch via the reverse proxy, the network trace shows that the hello request is always sent, but the still-active reverse proxy responds with 502 Bad Gateway. As mentioned earlier, the perspective error handling only redirects to the backup if the request receives no response.

In the second screenshot you can see a working redirect. In this test I accessed the master directly (bypassing the reverse proxy). After I stopped the master, the redirect happened immediately. In this case, the hello request to the master returns a network error with no status code.

Interpretation: Behind the reverse proxy, a fast HTTP 502 is treated as a (valid) response, so the failover doesn’t trigger. When hitting the master directly, the lack of any response triggers the failover as expected