Ignition Redundancy Setup Problem

Dodonnell999 · February 14, 2012, 12:21pm

I have an Ignition 7.3.2 (b533) system and I want to set the system up with redundancy, I have a second server and second licence but can’t seem to get it to work. Both servers are in a DMZ and have a second NIC facing on to a domain that have a third public IP address for accessing the system. The following is a summary of what I have:

Server1 has two NICs:
NIC1 = DMZ_IP1
NIC2 = DOMAIN_IP1
PUBLIC_IP1

Server2 has the same:
NIC1 = DMZ_IP2
NIC2 = DOMAIN_IP2
PUBLIC_IP2

So I’ve tried setting up the redundancy with the auto detect on everything selected and with binding the interfaces - all that happens is that the backup node appears for a few seconds, then disappears. It does this for 3 or 4 times then it shuts down the ignition service on the backup machine.
Also when I have server1 set up as a master I can’t launch the client or designer from outside the domain (ie, via the Public IP address)

I get the entries below in the console. ( I have since added the Modbus module that was causing the imcompatible module manifest but I haven’t tried the redundancy since )

Could the imcompatible module manifest have caused the redundancy to fail?

Is the problem to do with the fact that the public IP address is being used to access the system?

What settings should I have for this knid of system?

(I) 11:23:15 MasterTCPChannel (id=101) Shutting down channel.
(I) 11:23:15 MasterTCPChannel Error occured while communicating with backup client. Client has likely become unavailable.
java.io.EOFException
more
(I) 11:23:14 MasterTCPChannel (id=102) Shutting down channel.
(I) 11:23:13 MasterStateManager System UID has been updated to ‘c7cb287a-ecc8-4a87-853a-6713a0f98256’
(I) 11:23:08 ENGINE Successfully backed up instance ‘settings’ to ‘C:\Windows\System32\config\systemprofile\AppData\Local\Temp\tempdir1833838871640661806340882722101\db_backup.tar.gz’
(I) 11:23:04 ENGINE Initiating backup of instance ‘settings’
(I) 11:23:04 ENGINE dataFileCache commit start
(I) 11:23:04 ENGINE Checkpoint end
(I) 11:23:04 ENGINE defrag end
(I) 11:23:04 ENGINE open end
(I) 11:23:04 ENGINE open start
(I) 11:23:03 ENGINE dataFileCache commit start
(I) 11:23:03 ENGINE defrag start
(I) 11:23:03 ENGINE Checkpoint start
(I) 11:23:03 MasterStateManager Backup node has requested a ‘full’ restore.
(I) 11:23:03 MasterTCPChannel System restore initiated by backup node. System will provide a ‘full’ restore file.
(I) 11:23:02 MasterStateManager Successfully registered new connection to redundant master from ‘/10.49.31.83:49600’
(I) 11:23:02 MasterTCPChannel Starting redundancy channel id 102
(I) 11:23:02 MasterTCPChannel (id=102) Connected to address ‘/10.49.31.83’
(I) 11:23:02 MasterTCPChannel Backup node’s module manifest is incompabile with the master’s. A full backup will be performed. Value: [(fpmi/5.3.2 (b301)), (fsql/5.3.1 (b86)), (help/1.3.1 (b127)), (mobile/1.3.1 (b55)), (modbus-driver2/2.3.2 (b166)), (rept/1.3.2 (b46)), (siemens-drivers/1.3.1 (b110)), (symfact/2.5.1 (b26)), (udp-driver/1.3.1 (b64)), (xopc/1.3.2 (b317)), (xopc-drivers/1.3.2 (b148))]
11:23:02 MasterTCPChannel The module ‘modbus-driver’ is missing from the backup system.
(I) 11:23:01 MasterTCPChannel Peer node information has been updated: ClusterNode(address=10.49.31.83, httpAddresses=[10.49.31.83:8088/main], sessionCount=0, loadFactor=1, activityLevel=Cold)
(I) 11:23:01 MasterTCPChannel Reporting master start time of Tue Feb 14 11:22:08 GMT 2012
(I) 11:22:59 MasterStateManager Successfully registered new connection to redundant master from ‘/10.49.31.83:49599’
(I) 11:22:59 MasterTCPChannel Starting redundancy channel id 101
(I) 11:22:59 MasterTCPChannel (id=101) Connected to address ‘/10.49.31.83’
(I) 11:22:25 MasterStateManager Join-wait time has expired, will refresh system state.
(I) 11:22:21 MasterTCPChannel (id=99) Shutting down channel.

Colby.Clegg · February 14, 2012, 4:26pm

Hi,

Yes, the problem is that the backup node doesn’t have the correct modules. The “full” restore is supposed to provide them, but doesn’t appear to be working correctly.

To solve this, do the following:

Stop the backup server
Copy all of the modules from the server ({Install Dir}\user-lib\modules) over to the backup, and replace the modules there with them.
Restart the backup.

After connecting to the master, the backup should restart one more time, but you should see that it is only doing a “data only” restore.

Regards,

Dodonnell999 · March 1, 2012, 5:20pm

I’ve finally got around to updating the backup server with the same module manifest as the master and now the system goes into a master backup configuration, but I can’t seem to run the client from outside the network by using the Public_IP1 or Public_IP2 - It comes up with an Timed Out Error.
I’ve set up the system to auto detect for the network settings in the redundancy settings page, however I’m not sure whether I need to specify certain IP address for connecting.

The main IP addresses used for connection are the DOMAIN_IP1,DOMAIN_IP2,PUBLIC_IP1 and PUBLIC_IP2. Do I need to specify these IP addresses in the redundancy network settings to make the system work?

Colby.Clegg · March 1, 2012, 6:29pm

Hi,

I think you need to manually specify the addresses under the “HTTP Addresses” setting in the Redundancy settings.

You can leave “Auto-detect network interface?” as TRUE, as that is only used for the connection between the backup and master. The HTTP Addresses are the ones that are sent to clients, so change “Autodetect HTTP Address” to false, and enter in the addresses for that node. That is, you’ll need to do this on the master and the backup, but only put the addresses for the node you’re setting them on. In other words, the master will have DOMAIN_IP1, PUBLIC_IP1, and the backup will have DOMAIN_IP2,PUBLIC_IP2. When the two systems are connected, their lists will be combined and sent to the clients.

Hope this helps,

Dodonnell999 · March 2, 2012, 10:31am

Thanks for that Colby, it worked a treat!

I just have one more question, I have 6 devices on the system and all 6 are shown as connected on the Master server, however only 3 are shown as connected on the backup server. Should all six devices be shown as connected on the backup server or does it only try connecting if the Master is gone?

The problem may be a firewall problem from the backup server to the devices, however I want to be sure the devices should show connected before I get the customer to check this.