DiskBackedHistoryStore: failed to drain metrics for edgeNodeId

Hi. We’re having an odd behaviour from Ignition today. The device is connected via mobile connection and the signal seems unstable. Given this Ignition appears to have entered a loop from which it’s unable to get out.

On the Ignition server we see the device as disconnected, on the log I see the exception below sometimes followed by birth requests, three per seconds.

INFO   | jvm 1    | 2020/01/09 15:32:25 | E [c.c.m.t.g.h.DiskBackedHistoryStore] [14:32:25]: failed to drain metrics for edgeNodeId=abc/ips-edge-test-oracle and deviceId=null
INFO   | jvm 1    | 2020/01/09 15:32:25 | org.h2.jdbc.JdbcSQLException: Connection is broken: "session closed" [90067-197]
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.message.DbException.get(DbException.java:179)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.message.DbException.get(DbException.java:155)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.engine.SessionRemote.checkClosed(SessionRemote.java:556)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.engine.SessionRemote.removeServer(SessionRemote.java:486)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.command.CommandRemote.executeQuery(CommandRemote.java:189)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:114)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at com.cirruslink.mqtt.transmission.gateway.history.DiskBackedHistoryStore.drainMetrics(DiskBackedHistoryStore.java:869)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at com.cirruslink.mqtt.transmission.gateway.history.DiskBackedHistoryStore.drainMetrics(DiskBackedHistoryStore.java:852)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at com.cirruslink.mqtt.transmission.gateway.history.DiskBackedHistoryStore.drainNextMetrics(DiskBackedHistoryStore.java:413)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at com.cirruslink.mqtt.transmission.gateway.history.DiskBackedHistoryStore.flush(DiskBackedHistoryStore.java:555)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at com.cirruslink.mqtt.transmission.gateway.history.HistoryPublisher.publish(HistoryPublisher.java:165)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at com.cirruslink.mqtt.transmission.gateway.history.HistoryPublisher.run(HistoryPublisher.java:130)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
INFO   | jvm 1    | 2020/01/09 15:32:25 |       at java.lang.Thread.run(Thread.java:748)

after this tons of:

INFO   | jvm 1    | 2020/01/09 15:37:21 | W [c.c.m.t.g.m.EdgeNode          ] [14:37:21]: Birth in progress, rescheduling publish task com.cirruslink.mqtt.transmission.gateway
.model.EdgeNode$NodePublishDelayTask@1e549cc
INFO   | jvm 1    | 2020/01/09 15:37:21 | W [c.c.m.t.g.m.EdgeNode          ] [14:37:21]: Birth in progress, rescheduling publish task com.cirruslink.mqtt.transmission.gateway
.model.EdgeNode$NodePublishDelayTask@1c017ee

until a new exception like the one above is thrown.

Before this we found this error in the log:

INFO   | jvm 1    | 2020/01/08 02:14:19 | W [c.c.m.t.g.h.DiskBackedHistoryStore] [01:14:19]: Maximum tag capacity is reached, the Records table has space for 0 metrics only

After throwing some of these it started the loop above. So it seems after some failed storing it entered a condition which prevented it from working again.

We had to restart Ignition and it resumed normal operativity.

Is there something we can do on the configuration side to prevent this kind of disruption?

Thanks

@wes0johnson

This is a known issue we’re currently working on. We should have a fix in the near future.

Thanks. Is there an ETA for the fix?

I don’t have an ETA yet but it is one of our high priority tasks right now. I’m hoping it’ll be resolved in our nightly builds sometime next week.