Redundant Ignition system, MySQL latency/issues (vsat)

SBL_MC · October 5, 2020, 8:59am

Hi all,

We have a running redundant system onboard a vessel with very limited bandwidth (vsat), connected to 3 MySQL servers. Local, backup and remote. The remote SQL server is hosted in the AWS cloud. Both servers are always connected to all SQL servers.

As the connection goes down and varies a lot I made a gateway script running every 10 minutes checking the connection, disables is if not equal valid, and enables if status is disabled.

What we very often see is that the primary ignition server does not succeed in connecting, while the backup does. Still, on the MySQL client connections I can see two valid connections.

Is there a way to “disable” SQL connections on the secondary as long as primary is present?

Is there any recommendations regarding SQL connection parameters on such narrow band connections? I tried to play with time out etc, but is should be plenty. I can connect to the gateway interface and designer, but its slow.

The log tells me nothing about why it cant connect.

Thanks all.

pturmel · October 5, 2020, 11:11am

Client connections to databases are very sensitive to both dropped packets and latency, which are both terrible for satellite connections. You probably should consider a multi-master MariaDB setup, with Ignition only connecting to the local pair, and those propagating the changes as comms permit to the third server. (The local MariaDB servers should not be on the same computers as the Ignition gateways.)

SBL_MC · October 5, 2020, 1:39pm

Maybe the primary is failing as it s trying to send data from the S&F buffer. I do still find it a bit odd that it just says “Connecting” while the backup goes straight to “Valid”. On Friday when I implemented the gateway script I could toggle “Enabled” and it was working for almost two days.

Propagating is also possible between MySQL servers, but I would think it still need a proper connection?

Can you please elaborate why we should not run SQL servers on the same computers as the gateways?

Kyle_Chase · October 5, 2020, 3:02pm

What type of data are you sending? Just history data, or are you running general queries as well?

SBL_MC · October 5, 2020, 3:19pm

Only queries. Run prep update with SF at a very slow rate.

pturmel · October 5, 2020, 4:18pm

Yes, but replication is designed to operate with connections that come and go. They will also transmit data in a bulk form that is more efficient with high-latency connections.

Mainly because databases are memory and CPU hogs. Also to minimize the recovery disruption when broken parts are restarted.

SBL_MC · October 6, 2020, 8:40am

Store & Forward, it looks to me that we can configure it to transmit data in bulk.

I tried to set up schedule pausing the transfer, but I can’t get a valid connection from the primary.

bschroeder · October 6, 2020, 1:01pm

What version of MySQL are you connecting to? Have you updated to the latest JDBC driver for MySQL?
Any errors in the MySQL logs that correspond to the event?

SBL_MC · October 7, 2020, 6:35am

AWS RDS is running MySQL engine 5.7.17

2020-10-07T06:18:56.598263Z 99873 [Note] Aborted connection 99873 to db: ‘removed’ user: ‘removed’ host: ‘removed’ (Got an error reading communication packets)

This is what I get from the log.