Gateway Scheduled Script executed twice unexpectedly

I have a v8.1.32 Gateway Schedule Script set to run at midnight.

It inserts some data into a database through a named query. I didn't have logging in before this incident but I have it in now.

I can't figure out why it would have executed twice but my only theory is it has to do with running a redundant server or my server clock got set to midnight unexpectedly. Is there anything in the logs i can look for to diagnose what might have happened?

The data gets inserted into the database with the timestamp of midnight hard-coded since the script is executing only at midnight so I can't diagnose the issue from the timestamp alone on this one. The data sent to the database was different enough to indicate a gap of 6 or 7 hours at minimum from the first entry to the second.

I am lost on this one. Any advice / thoughts are appreciated.

Show your script, and if there are no logs in your script that are like “Beginning Midnight Process” etc I would highly encourage adding something like that.

Good practice for gateway scripts is to log out the start time and params if any used, and when it ends, along with elapsed time, its often useful to know how fast a script runs.

Also wouldn’t do this because of the exact scenario you’re running into now. Just let the db use CURRENT_TIMESTAMP or use a system.db.now() to feed into your named query and then right now you’d be seeing exactly when each ran.

3 Likes

(post deleted by author)

Is it an inheritable project with one child project?

After further investigation it appears some of the other Gateway Event Scripts executed twice multiple times yesterday.

I have another script that executes every 3 hours but the duplicates only occurred at 6 pm, 9 pm, midnight, 3 am, and 6am. All other entries as far back as two weeks ago look ok.

I highlighted the duplicate entries below in blue (this is the results of a separate gateway script)... I'm not going to share the full script(s) as there is some proprietary information in them.

Are your servers actually configured as redundant and not two independent servers? Probably not but worth a check.

And inside your script at the very end does it call itself again? Sometimes when testing things in script console it's common to put the script def in there and then have to call it in order for it to run. Could be a copy paste error moving it into the library where it calls itself a second time.

1 Like

Any chance someone took a gateway backup and restored it on their own machine for testing/development?

2 Likes

Look for logs suggesting loss of master-backup comms. This might be a split-brain scenario, where the gateways lose contact with each other, but can still talk to devices and database.

Yes, servers are configured as redundant.

I was looking at another gateway script that failed and narrowed the timeframe down to around 3:30 pm to 6:30 am the following morning. Here is a screenshot from the logs around 3:20 pm.

A lot of errors / messages that I don't recognize. Not sure if I should open an official ticket with Ignition at this point.

@paul-griffith Always a chance :slight_smile:

@pturmel Yes, I am seeing some logs that might suggest this is what happened.

1 Like

Look at the log on the other server for the same timeframe. See if it also thinks the connection went down. If they both are "Active" it's the split-brain scenario Phil mentioned. Which is a bad thing and causes havoc.

Are your gateways on the same subnet, physically close to each other? If not, fix that. See this:

More:

https://forum.inductiveautomation.com/search?q=%40pturmel%20split%20brain%20order%3Alatest

1 Like

I'll have to reach out to the customer and get the answers from them. I was not in charge of the architecture setup or configuration and they have some weird security that limits my actions.

I'll update this thread at some point with whatever answers i find.

1 Like

We have a similar issue tbh that I've been trying to diagnose and haven't made much leeway. Both servers are physically stacked on top of each other and on the same network and such so shouldn't be any network concerns but they drop out of sync with similar errors every once in a while. Going to try and update the 3rd party modules as see if it's maybe being caused by those but otherwise not sure at the moment and need to bug support when I get time to deal with it.