I noticed something strange in the Gateway Performance → Memory Trend chart.
Sometimes there are vertical gaps (blank areas) in the graph.
There was no gateway restart, no errors in logs, and the system didn’t lose connection. Perspective sessions continued normally.
At the same time I occasionally see warnings like:
Clock drift, degraded performance, or pause-the-world detected
Usually around 1000–2000 ms.
Current setup:
• Heap: 16 GB
• Usage usually 6–11 GB
• GC: G1 Young / Old
• CPU very low (~3%)
One important detail:
Right now everything runs on a single gateway:
• MQTT Engine / Transmission
• DB queries
• Perspective frontend
• integrations / backend logic
So basically frontend + backend + messaging all in one gateway.
Questions:
What usually causes these gaps in the performance graph if there is no restart and nothing in logs?
Are the clock drift warnings typically related to GC pauses or something at OS/VM level?
Has anyone seen improvements by splitting frontend (Perspective) and backend (tags/MQTT/DB) into separate gateways?
Just trying to understand if this is normal behavior or something worth investigating.
Sometimes the gaps in the trends are javascript artifacts that disappear if you reload that page. If the gap persists, it means that internal capture of the value was interrupted.
FWIW, running the database on the same machine as the gateway is not recommended, particularly with any significant load. Databases and Ignition use CPU differently and incompatibly--the DB will end up dominating the CPU usage.
You should activate GC logging to see the actual GC pause durations, and see if they correlate with your clock drift events.
Reloading the page didn’t remove the gap on our side, so it looks like a real capture interruption (not a JS artifact).
What actually “fixed” it was restarting the Ignition service/gateway — after restart the trend was normal again.
However, we still see a small gap occasionally even when the system is basically idle (CPU ~few %, heap usage stable). So I’m trying to understand what could cause that without an obvious load spike.
Two questions:
What’s the recommended way to enable GC logging on Ignition 8.1 (where do I add the JVM args on Windows)?
Are there any other logs/metrics you’d check to confirm whether this is a GC pause / scheduler stall / IO hiccup?
Examine the GC log to find the pauses around the time of your clock drift reports. Look for pauses that significantly exceed the "MaxGCPauseMillis" configured. (Default varies by java version--250ms now-a-days I think.)
If you are getting clock drifts that don't correspond to GC pauses, then look for:
Time warps from NTP (Windows doesn't slew for clock corrections the way competent OSs do). Consider not using Windows.
VM Hypervisor resource starvation. (Typical in IT controlled environments where hypervisors are deliberately overcommitted to save $$.)
The system is currently running in production, so I can’t restart the service at the moment. When I get a chance to restart the gateway and enable GC logging, I’ll check the pauses around the clock drift times and update the thread.
OS: Windows Server 2019 (Ignition and the database are on separate servers).
Disk: Both the DB and Gateway servers have plenty of free space (around ~1TB free), so it doesn’t appear to be related to disk capacity.
After increasing the Gateway heap size, the memory situation improved. However, the system has grown quite a lot over time, so I believe the issue may now be more related to topology and load distribution rather than OS or disk.
Currently we are running a single Gateway that handles:
• MQTT Engine (data coming from multiple sites/customers)
• SQL Bridge
• Alarm system + Historian
• Perspective / reporting loads
In total, this setup serves more than 5 customers / multiple facilities from the same Gateway.
Because of that, I’m starting to think that this architecture may not be ideal anymore. I’m considering separating the system into backend and frontend gateways, something like:
• Backend Gateway(s): MQTT ingestion, Historian, Alarming, SQL Bridge, database-heavy tasks
• Frontend Gateway: Perspective sessions, reporting, dashboards
Does this approach make sense at this scale?
If anyone has experience with a similar architecture (or best practices for splitting these modules across gateways), I’d really appreciate your suggestions.
If you mean 5 unique customers, not 5 sessions all belonging to the same company, then this is an IA license breach anyway and should be split into 5 independent servers.
No, it’s not about 5 different customers. What I meant here is the overall system load. I will look into separating the backend and frontend.
Currently, I’m not receiving any errors after restarting the service. Also, I will update the configuration file with the changes Phil suggested as soon as I have the opportunity to restart the service again.
Thanks.
You will not sublicense, rent, lease, sell, trade, resell, publish, transfer, or lend the Software or the Documentation without Inductive Automation’s written consent.
3.2
Hosted Systems. You will not permit any third party to benefit from the use or functionality of the Software via a rental, lease, timesharing, service bureau, hosting service, or other similar arrangement without Inductive Automation’s written consent.
The key word I was looking for that my IA rep told me years ago was “multi-tenant”, I don’t see that.
It definitely gets into a gray area. We have a potential project that is a company that has let's say 100 trailers that has equipment on them with a local HMI, and they want to have a central server that they can operate the units remotely or just monitor them and run reports. They don't operate these trailers themselves, but they lease the trailers out to customers and those customers would get access to the central server to also monitor all of their leased units. Technically they wouldn't be leasing the software to the customers, because the main system is owned/operated by the lessor of the equipment, and they're just allowing their customers who lease the equipment to access all their systems remotely via their server. I personally don't think this violates the license agreement but like you, IANAL.