PLC Tag Polling Issues During Heavy Network Load

I have a client with a very large SQL database for tag history using the tag historian module. Tag history and everything is working fine, but they have set up a scheduled task on the SQL server to back it up in the early AM every day.

The SQL server and Ignition Gateway are separate VMs on the same host server.

What is happening is when the backup runs I will get comm loss alarms from every PLC that Ignition is connected to. These comm loss alarms are generated from a PLC "heartbeat" tag that doesn't change for a set amount of time. Once the backup is done everything goes back to normal.

I believe that the bottleneck is one of the network switches and we are trying to get them to upgrade, but I would also like to know if we have other options and/or if this is expected behavior.

I am not familiar enough with SQL and backups to know if there's something we should suggest that the client change in the backup settings/schedule.

It might be interesting to get a series of thread dumps while the SQL server backup is running.

Suggest to your client to set up streaming replication to another SQL Server, preferably in the same network as the backup infra. Then take your backups from that replica instead of the live server.

I'm not terribly familiar with Microsoft's streaming replication, but if you are using a competent database, a tool like Barman will not only run full backups on various schedules and with various retention policies, but also maintains a live pseudo-replica via the streaming protocol, so you can do point-in-time-recovery as needed.