Running a three-tier GAN and hitting a frustrating alarm state sync issue on the front-end. State goes stale by up to an hour until I manually toggle the remote provider.
Architecture:
Site Gateway → AWS-Backend-Prod → AWS-Frontend-Prod (Perspective)
Both hops are remote tag providers with Alarm Mode = Subscribed, Alarms Enabled. Backend security zone has Alarm Status Service Access = Allow, ack and shelving both allowed. Proxy Hops = 2 set on the site gateway.
What's happening:
Alarm state changes — new actives, acks, clears — arrive properly. A Vision client sitting directly on Backend shows real-time state correctly, so the Site→Backend leg and the alarm engine itself look fine.
The weird part: outbound acks work. Acking from the Perspective Alarm Status Table on the front-end goes through — Vision confirms it on Backend — but system.alarm.queryStatus() on the front-end still returns isAcked=False / Active, Unacknowledged for that event. So writes are propagating out, reads aren't propagating back.
Toggling the remote provider (disable → re-enable) fixes it immediately — missing alarms appear, correct state everywhere. Same thing happens if I flip the History Access Mode between Database and Gateway Network; that also forces a re-subscription and the snapshot lands correctly. But then it drifts again. Looks like the initial subscription snapshot delivers fine on every reconnect, and then the async update channel goes quiet while the GAN link stays up and unfaulted — no errors logged anywhere.
Already tried:
- Both providers confirmed Subscribed (not Queried)
- Proxy Hops = 2 on site gateway
- Tightened Site→Backend ping settings — was 5000ms rate / 800ms timeout / 30 missed pings (~150s to fault); now 1000/300/10
- Backend security zone Alarm Status verified Allow
Questions:
- Is subscribed alarm state actually expected to survive reliably across two GAN hops, or does it need a direct provider for the async channel to stay healthy?
- Is there a known failure mode where the subscription delivers its initial snapshot on connect then stops receiving async updates — connection looks fine, no faults logged, but updates just stop coming through?
- Any way to detect or auto-recover a dead async subscription channel without manually toggling the provider?
On 8.1.x across all three tiers. Can pull gateway logs if helpful.