1. Problem Description
We are experiencing a critical issue when Gateway scripts (both Timer and Tag Change) hang after a specific redundancy failover sequence. This issue occurred during a planned redundancy switch. Over the course of 10 failover tests, the problem occur 3 times.
Test Process:
- Ignition Master and backup servers run normally ,and Master gateway in work state.
- The Master Ignition Gateway is shut down.
- The backup Gateway takes control successfully, The system operates normally on the backup.
- The original Master Gateway restarted and Problem Occurs: Immediately after the Master takes back control, we observe that all Gateway scripts (e.g., Tag Change, Timer events) run incorrectly.
a. Gateway Scripts Stop Executing: On the Status > Gateway Scripts page, the "Last Execution" column for all relevant scripts permanently displays "Never".
b. Scripts Are Stuck: On the Diagnostics > Running Scripts page, all triggered scripts appear in the list, but their "Elapsed Time" continuously increases. This indicates they have been initiated but are hung and cannot complete execution. - Then we kill the fail scripts,but it doesn't work.If a stuck script is manually killed from the diagnostics page, any new instance of that script triggered by its event (Timer or Tag Change) will also immediately hang,and if the Master server (now in a failed state) is restarted, the backup server takes control but now also exhibits the same script-hanging behavior, suggesting a corrupted state may have been synchronized.
2.Temporary Recovery Method
The only method we've found to restore the system to a fully functional state is a precise sequence:
- On the active (and malfunctioning) Gateway, kill all scripts from the Diagnostics > Running Scripts page.
- Shut down the backup Gateway completely.
- Restart the Master Gateway.
- Verify that the Master Gateway is running and scripts are executing normally.
- Once the Master is confirmed to be stable, start the backup Gateway.
3. System Work Environment
• Ignition Setup: A two-server redundancy configuration (Master/Backup) using the Hot-Standby mode. The synchronization state is "Good".
• Project Stability: The project has been running without issues for over 3 months prior to this event.
• Modules: All standard modules are running and activated (see attached screenshot for details).
• Platform information
Product: Ignition Platform
Version: 8.1.47 (b2025022612)
License: Standard Edition
4. Troubleshoot & Validate
We conducted some research and test to investigate the root cause. Our analysis indicates the problem is not with our script logic itself, but with Ignition's scripting engine when specific code constructs are present in a project script library.
Our diagnostic process was as follows:
-
After the failure occurred, we first killed all stuck scripts and then disabled all Gateway Events in the project to establish a stable baseline.
-
We created a new Project Script Library (e.g., "util_test").
-
Inside this new library, we created a single, very simple function named read_test(). This function only contained a system.tag.readBlocking() call and returned the value.
-
We created a new Timer Event set to run every second, which only called this util_test.read_test() function.
-
Result: We saved the project, and this new, isolated Timer Event executed perfectly.
-
Next, we began step by step adding our other production functions back into the same "util_test" script library. The Timer Event was not modified and continued to only call the simple read_test() function.
-
Important:We discovered that the read_test() script would hang the moment we added a function that contained certain code patterns into the script library.
The code constructs that trigger this system-wide failure are:
a. The next() function used with a generator expression:
b. Even the use of while() function
c. If we comment out these problematic sections of code and save the project, all Gateway Timer events immediately begin to run correctly again.
d. Important notes:
- The issue is not caused by the execution of the problematic code, but by its mere presence in a project script library.
- Our simple test event, which only called a basic system.tag.readBlocking() function, was made to fail simply because another function containing a next() or while construct existed in the same script library, even though that other function was never called.
5. Speculation
Maybe there is a potential bug in the Ignition version 8.1.47 scripting engine, possibly related to how scripts are parsed, loaded, or handled in memory during a redundancy failover and resynchronization event.