Hi everyone,
in the last weeks I have the folowing problem in a redundant application.
Some times, around midnight, clients lost comunication with the gateway for a couple of minutes (6-7 minute), some times is showed the info popup that check master and redundancy connections.
In master gateway log I found this;
MasterStateManager 11Dec2018 00:00:58
Redundancy state changed: Role=Master, Activity level=Cold, Project state=Good, History level=Full
MasterTCPChannel 11Dec2018 00:00:58
Peer node information has been updated: RedundancyNode(address=172.31.10.76, httpAddresses=[http://172.31.10.76:8088/main], sessionCount=0, activityLevel=Active, projectState=Good)
MasterTCPChannel 11Dec2018 00:00:57
Received a full runtime state update from the other redundant node.
MasterTCPChannel 11Dec2018 00:00:57
Peer node information has been updated: RedundancyNode(address=172.31.10.76, httpAddresses=[http://172.31.10.76:8088/main], sessionCount=0, activityLevel=Cold, projectState=Good)
MasterTCPChannel 11Dec2018 00:00:57
Reporting master start time of Fri Dec 07 14:33:43 CET 2018
and after a some logs this:
MasterTCPChannel 11Dec2018 00:00:59
Peer node information has been updated: RedundancyNode(address=172.31.10.76, httpAddresses=[http://172.31.10.76:8088/main], sessionCount=0, activityLevel=Cold, projectState=Good)
MasterStateManager 11Dec2018 00:00:59
Redundancy state changed: Role=Master, Activity level=Active, Project state=Good, History level=Full
In backup gateway
Provider 11Dec2018 00:23:49
Starting scan classes due to redundancy state change.
ModbusDriver2 11Dec2018 00:23:49
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
ModbusDriver2 11Dec2018 00:23:49
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
ModbusDriver2 11Dec2018 00:23:49
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
ModbusDriver2 11Dec2018 00:23:49
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
BackupStateManager 11Dec2018 00:23:49
Redundancy state changed: Role=Backup, Activity level=Active, Project state=Good, History level=Full
BackupTCPChannel 11Dec2018 00:23:49
For troubleshooting: The last received message was [CurrentStateMessage[activity=Active, sessions=16, projectstate=null]] The last sent message was [[RTSYNC_MSG, id=300]]
BackupTCPChannel 11Dec2018 00:00:57
Received a full runtime state update from the other redundant node.
BackupTCPChannel 11Dec2018 00:00:56
Peer node information has been updated: RedundancyNode(address=172.31.10.79, httpAddresses=[http://172.31.10.79:8088/main], sessionCount=16, activityLevel=Active, projectState=null)
ProjectRunner 11Dec2018 00:00:56
Setting SQL Bridge project enabled state to 'DISABLED'
ProjectRunner 11Dec2018 00:00:56
Setting SQL Bridge project enabled state to 'DISABLED'
ProjectRunner 11Dec2018 00:00:56
Setting SQL Bridge project enabled state to 'DISABLED'
ProjectRunner 11Dec2018 00:00:56
Setting SQL Bridge project enabled state to 'DISABLED'
Provider 11Dec2018 00:00:56
Stopping scan classes due to redundancy state change.
Provider 11Dec2018 00:00:56
Stopping scan classes due to redundancy state change.
Provider 11Dec2018 00:00:56
Stopping scan classes due to redundancy state change.
Provider 11Dec2018 00:00:55
Stopping scan classes due to redundancy state change.
BackupStateManager 11Dec2018 00:00:55
Redundancy state changed: Role=Backup, Activity level=Cold, Project state=Good, History level=Full
BackupTCPChannel 11Dec2018 00:00:55
Negotiated activity level has changed. The master node has asked this node to become 'not active'
ProjectRunner 11Dec2018 00:00:55
Setting SQL Bridge project enabled state to 'ENABLED'
ProjectRunner 11Dec2018 00:00:55
Setting SQL Bridge project enabled state to 'ENABLED'
ProjectRunner 11Dec2018 00:00:55
Setting SQL Bridge project enabled state to 'ENABLED'
ProjectRunner 11Dec2018 00:00:55
Setting SQL Bridge project enabled state to 'ENABLED'
Provider 11Dec2018 00:00:55
Starting scan classes due to redundancy state change.
Provider 11Dec2018 00:00:55
Starting scan classes due to redundancy state change.
Provider 11Dec2018 00:00:53
Starting scan classes due to redundancy state change.
ModbusDriver2 11Dec2018 00:00:53
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
ModbusDriver2 11Dec2018 00:00:53
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
BackupTCPChannel 11Dec2018 00:00:53
For troubleshooting: The last received message was [[VERSION_OK, id=101]] The last sent message was [CurrentStateMessage[activity=Cold, sessions=0, projectstate=Good]]
ModbusDriver2 11Dec2018 00:00:53
Redundancy ActivityLevel changed from Cold -> Active, scheduling a connect.
and this logs are strange to me
Provider 11Dec2018 00:24:02
Stopping scan classes due to redundancy state change.
Provider 11Dec2018 00:24:02
Stopping scan classes due to redundancy state change.
BackupTCPChannel 11Dec2018 00:24:02
Peer node information has been updated: RedundancyNode(address=172.31.10.79, httpAddresses=[http://172.31.10.79:8088/main], sessionCount=16, activityLevel=Active, projectState=null)
BackupTCPChannel 11Dec2018 00:24:02
Project version synchronized, backup node is up-to-date.
BackupTCPChannel 11Dec2018 00:24:02
Peer node information has been updated: RedundancyNode(address=172.31.10.79, httpAddresses=[http://172.31.10.79:8088/main], sessionCount=16, activityLevel=Active, projectState=Good)
BackupTCPChannel 11Dec2018 00:24:02
Peer node information has been updated: RedundancyNode(address=null, httpAddresses=null, sessionCount=16, activityLevel=Active, projectState=null)
BackupTCPChannel 11Dec2018 00:24:02
Master start time was reported to be 'Tue Dec 11 00:01:02 CET 2018' (adjusted to backup clock)
BackupTCPChannel 11Dec2018 00:24:02
Server time sync complete. Server time is different by 2968 ms.
Provider 11Dec2018 00:24:02
Stopping scan classes due to redundancy state change.
Provider 11Dec2018 00:24:02
Starting scan classes due to redundancy state change.
Provider 11Dec2018 00:24:02
Starting scan classes due to redundancy state change.
It seems that, for some reason it required a switch to backup and immediately a recover on master.
Some ideas?
Thank you