Failover to Redundant Backup not working

Hello Everyone,

There is a facility that is running 7.9.13 that was doing a windows OS upgrade on the primary gateway yesterday and the engineer pointed out to me that he could not force a fail over to the backup gateway. I was not in the position that day to take a look but he sent me the following screenshots.

Can anyone shed some light on what these errors mean? This is what the client showed when he restarted the primary gateway to finish the OS upgrade. He told me that no vision clients ever switched over to the backup gateway.

Just for some context. I am looking into the issue today and I have done the following steps:

  1. Forcing a resync to the backup machine which completed without errors.
  2. Tried to force the failover from the primary which did nothing
  3. Tried to force the failover from the backup which caused all clients to reset but they reconnected to the primary gateway after the reset.

The following is a portion of the gateway log on the primary gateway:

As you can see a second after the primary gateway switched to the ‘cold’ state the backup sent that it’s activity was cold which seems wrong to me…

Then the primary gateway switched back to active which caused the vision projects to restart which is what I seen happen in real time watching the process.

Any help or troubleshooting advice would be much appreciated!

Thanks,
Brandon

1 Like

Check all of the installed modules on both servers. In a redundant pair, while synchronized, modules installed on the master will be simultaneously installed on the backup. Otherwise, you have to install manually on the backup. If a module is installed while running on the backup, I think you have to manually install on the master (haven’t tested that).

Thank you Phil for the tip.

Turns out you are correct:

Primary Gateway:

Backup Gateway. (SFC module is in here for some reason)

Can you give any advice on the least disruptive way to fix this? I don’t need SFC so how can I get it off of the backup gateway?

EDIT:
So I can’t uninstall this module on the backup so is there a download page for individual modules that Inductive Automation manages?

Thanks,
Brandon

Shut the backup down. Delete that module file from the install folder. Restart the backup.

I found the modl file before you replied so as of now. The primary gateway has the same modules as the backup now. I tried another fail over and it still didn’t work. So I just finished doing a re-sync and will see what it does now when a failover is initiated.

I would also, as some convenient point, uninstall all of the modules you aren’t using.

Agreed.

So this is the logs when I tried the latest failover:

I don't understand why the projectState is Unknown on the backup. The status page shows that the sync status is Good. Would doing a full restart of the backup gateway help the projectState?

Based on the later log entries, it was passing through that state on the way to active when it noted the lack of synchronization. (Note the different loggers. Not sure of the significance.)

I don't have any clients using 7.9.13 with redundancy. I wonder if there's a known issue. If the obvious items don't fix it, it may be time to reach out to support.

Nightly Changelog: 7.9.15-b20200714
Ignition Platform
14915:Warn when client launcher JRE selection fails
Now aborting client launch if no system JRE can be found.
12211: Client won’t connect to backup node in specific scenario
Fixed an issue preventing Vision clients from connecting to backup gateways in certain redundancy scenarios.

This is from the 7.9 changelog post so there may be a problem. Thank you for the help Phil. I will ask support to take a look at this.

Was this ever resolved? One my my clients is having very similar issues on the same version.