MQTT Engine: how to disable buffering of data?

321liftoff · September 18, 2021, 3:59pm

@wes0johnson I had an instance where an Edge of Network node mistakenly published NBIRTH and DBIRTH repeatedly and VERY rapidly over some amount of time (at least an hour).

We noticed that our Perspective screens – being fed from MQTT Engine tags-- were showing data that was at least an hour old. I thought the buffering might coming from our external brokers, but this data was coming to Perspective screens even after shutting down the MQTT broker that MQTT Engine was connected to.

Based on this, the buffering seems to have come from Ignition/MQTT Engine itself. The rate of the data moving to the screens seemed faster than the rate that the data came in (based on watching the data timestamps change faster than 1/second), but not fast enough for us to wait and let the MQTT Engine empty its buffer on its own. So we restarted the MQTT Engine module and it started back up clean.

Given the confusion this caused having the Perspective displays show old data from tags that are intended to represent real-time data, are there any suggestions on how to improve this?

Is there some metadata from the MQTT Engine that will indicate that the data is not current and is from a buffer?
Is there a way to drop the data rather than buffer it if the rate is too much to handle?

Thanks!

wes0johnson · September 19, 2021, 4:22pm

There is a boolean tag here:
[MQTT Engine]Engine Info/MQTT Clients/SERVER_NAME/Enable Latency Check

When the above tag is true, the following will be 0 or greater:
[MQTT Engine]Engine Info/MQTT Clients/SERVER_NAME/Message Processing Latency

The ‘Message Processing Latency’ is a measurement in milliseconds of time time it takes between a message arriving and when that message has been fully processed. It is sampled (I think every couple of seconds) so it is not exact. Also, it is not enabled by default because it consumes some resources to make these calculations. So, this can give you a general idea of if MQTT Engine is getting backed up or not.

I should also note that there are some system configurations that could result in this not being a great indicator of a backup. Every Edge Node’s messages must be processed in order. So, internally in MQTT Engine, even though there is a thread pool, each Edge Node’s messages are funneled through a single thread. As a result, if 99% of your messages are being processed quickly but 1% are not, the message processing latency tag could show a low/good value the majority of the time. So, be aware of this.

At this point, there is not a way to clear the Engine buffers other than a module restart or enable/disable of MQTT Engine. I should note we just made configuration control scriptable in the latest version of many Cirrus Link Modules in the latest version (4.0.8). So, you could use this to quickly enable/disable MQTT Engine to clear any backup
https://docs.chariot.io/display/CLD80/ME%3A+Python+Scripting

I’d also be curious how you got into this state. We have systems with MQTT Engine consuming millions of tags and very high data rates. Obviously this requires appropriately sized machines. If you have a reproducible edge case I’d be interested in seeing it.

321liftoff · September 20, 2021, 2:08am

Thank you, for the additional information.

We had a Edge of Network node that, due to incorrect configuration on our end, was publishing NBIRTH and DBIRTH as fast as its CPU could handle. I’m not quite sure how long this scenario persisted, but likely 30min or greater. During this scenario, the Ignition Gateway was at 40% CPU (normally 5-15%), presumbably due to the increased traffic from that one node.

This is obviously not a normal scenario, but revealed some areas that should be improved in our system. The systematic issues identified at this time include:
–a malformed single node can affect the larger network
–data being presented on the display has a possibility of not being “now” data, if MQTT Engine is overloaded

The confusing part was the presence of data even after the node was disconnected from the MQTT Broker…to the point where it felt like the being on the receiving end of Stuxnet LOL It makes more sense after understanding the entire scenario: MQTT was processing each Birth in order, which there is a certain amount of time requried to do so, which may have worked out to be slightly faster than the delta timestamps of the Birth messages, thus the flushing time felt sluggish.