Backup node slow start

daniel.garcia-redrue · April 1, 2019, 6:39am

Hi all,
I have the following system running:

- Ignition v7.9.7 (over Windows Server 2016)
- 79 devices (2 TCP + 77 Modbus) and 160000 tags defined
- 1 database connection
- Redundancy node: warm mode, buffer size set to 30000
- Current gateway backup file size: 63MB

My problem is that when I restart my backup computer,the gateway remains in “STARTING…” for about 40 minutes. After that, when I check the devices detected in the backup node I notice that some of then have been lost (sometimes I have 44 devices, sometimes 19, sometimes only 1…)
This is a problem for me because if my master node fails, the redundant note won’t be able to read all my variables.
To solve this I have to save again every lost device from master node. After save a device, it appears again in backup node. But this is not a solution because the system is in a production environment.

Any idea?

Thanks in advance

pturmel · April 1, 2019, 12:11pm

Wrapper logs? Network details?
The severity of your problem suggests you need remote access help from IA support. This forum is not official support.

daniel.garcia-redrue · April 1, 2019, 1:44pm

I have two virtual machines hosted in two different buildings. Here the logs:

wrapper.log.zip (470.9 KB)

pturmel · April 1, 2019, 2:07pm

I didn’t look throughout, but I noticed an expression tag running listDevices blocking device creation. Try not doing that.

denis.palomanes · April 2, 2019, 9:33am

Hello,

We are in the same situation and I have found a few more people with this problem, I think there is a problem in Ignition when there are a lot of devices, It seems like if the master fails, the backup can not load all the devices…

I think this is a big problem…

daniel.garcia-redrue · April 2, 2019, 11:01am

Hi,
I’m not sure if the backup size is related with this issue. I remember that some time ago I exported the project and then imported it. The edit count reset to 0 and the backup size was about 100MB smaller.
Could be this problem that the backup node is not able to sync the full project?
Which state should appear in the projectState property in the backupNode?

Kevin.Herron · April 2, 2019, 11:53am

@pturmel is right about the listDevices - track that down and remove it and see if the backup loads.

The devices seem to be failing to load because of other contention on the internal DB.

daniel.garcia-redrue · April 2, 2019, 12:32pm

Thanks a lot for the answers @Kevin.Herron @pturmel.
I am using that to get the device status (connected tag) to detect communication failures. I am running this:
.runScript(“system.device.listDevices()”, 30000)

Is there a better way to get connection status of TCP and MODBUS devices? I used this because I had problems with the connected tag in the TCP devices

pturmel · April 2, 2019, 12:42pm

In an expression tag, the 30000 is ignored. That will run at the scan class interval instead. Remove it for now. Experiment with the scan class for it later. Or use a timer script instead of an expression tag. In the latter, you can defer calling listDevices until your system is running and stable.

Sanderd17 · April 2, 2019, 1:01pm

The documentation is very confusing in that regard. On one hand, the docs state that the scan class dictates "the minimum amount of time between executions". So that would mean you can always execute it slower than the scan class (enlarge the time between executions).

And on the other hand, the example seems to indicate it's the other way around.

Next to that, the now() expression also uses polling, but on that page, there's no mention of the relation with the scan class.

https://docs.inductiveautomation.com/display/DOC79/runScript

runScript Polling in Tags

The runScript function can be used in expression tags, but the poll rate doesn't work exactly the same as in an expression binding. All Tags have a Scan Class that dictates the minimum amount of time between each evaluation. The runScript poll rate only polls up to the rate of the scan class set on the tag.

For example, if an Expression Tag is configured with runScript to run at a poll rate of 60 seconds and is using the "default" (1 second) scan class, the Tag's Expression will still execute every 1 second. So a scan class rate of 60 seconds will be necessary for a runScript expression to poll at any rate between 0 and 60 seconds.

pturmel · April 2, 2019, 1:31pm

I can assure you that neither runScript() or now() will honor their poll parameter in an expression tag. They will run unconditionally at the scan class pace.