Is it possible to script a failover to a backup Gateway, and also is the system response time accessible?
We have a pair of gateway which have nightly backup taken of them, the server infrastructure is managed by the IT department and we don’t have visibility configurations.
During the vm backups, the system response time creeps up to ~100ms which starts causing clock drifts and other issues. We know the approximate start time when the backup tasks runs, and I’d like to monitor the system response time and if it’s within the windows and the server is active fail-over to the backup server during the backup and then fail-back when the backup gateway starts showing the same indicators.
What does your CPU and memory usage look like when the backup occurs? If the backup is consuming enough system resources and causing clock drifts you may consider allocating more cores/CPU’s and or memory.
The only thing that comes to mind to cause a failover is something like system.util.getGatewayStatus with a timeout set to 100ms, and then when it times out execute an external script to terminate the ignition scervice. That sounds dangerous though, like a band aid on a gunshot wound type of scenario.
The backup occurs outside of the VM processing, so I don’t think adding more cores will resolve the issue, and may potentially exasperate the situation with the way some hypervisors schedule execution.
Out side of backup task, the load runs between ~30-40%, when the backup process begins it starts spiking between 25-80%. What I suspect is happening is a scheduling issue within the hypervisor where the vm is ready to run but the host doesn’t have the available resources. It’s also seems to be causing other issues on the gateway network, were comms is timing out and then the DB queries are silently hanging causing the connection pool to max out and faults.
It’s definitely a band-aid solution, and I was hoping to find something that isn’t as dangerous as triggering an external bat file to restart the server instead of forcing a failover.
I meant to stop and start the service, which should trigger the fail over. I suppose you could also do this with windows task scheduler if you know when the backup starts and ends, but it is still something i wouldn’t want to do.