Error pruning alarm journal data - read timed out

richardNZ15 · June 9, 2022, 2:31am

Been getting these errors in the gateway logs recently. Not sure what would have changed to cause it.
My Alarm Journal is pretty large at the moment, with inserts taking up to 2min.

AlarmJournal is configured to use the S&F system, and enable pruning to keep the last 6 months.
The tables are created in my “Process” database connection, which has a Max Wait time set to -1 (to disable timing out).
I also have the following under “Extra connection properties” for the Process connection:
zeroDateTimeBehavior=convertToNull;connectTimeout=120000;socketTimeout=120000;rewriteBatchedStatements=true;

The database connection is pointing to a MaxScale read-write-service that is load balancing 3 SQL DB servers.
I don’t believe there is a read timeout on the MaxScale config.

Where else can i look to prevent a read time out?

richardNZ15 · June 9, 2022, 3:21am

Full error message below. Could it be related to any of the following MySQL variables?
net_read_timeout = 30
net_write_timeout = 60
slave_net_timeout = 60

java.sql.SQLNonTransientConnectionException: (conn=267231) Connection timed out

at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:73)

at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:153)

at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:273)

at org.mariadb.jdbc.ClientSidePreparedStatement.executeInternal(ClientSidePreparedStatement.java:229)

at org.mariadb.jdbc.ClientSidePreparedStatement.execute(ClientSidePreparedStatement.java:149)

at org.mariadb.jdbc.ClientSidePreparedStatement.executeUpdate(ClientSidePreparedStatement.java:181)

at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:136)

at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:136)

at com.inductiveautomation.ignition.gateway.datasource.SRConnectionWrapper$SRPreparedStatement.executeUpdate(SRConnectionWrapper.java:981)

at com.inductiveautomation.ignition.gateway.datasource.SRConnectionWrapper.runPrepUpdate(SRConnectionWrapper.java:181)

at com.inductiveautomation.ignition.gateway.alarming.journal.DatabaseAlarmJournal$JournalMaintenanceTask.run(DatabaseAlarmJournal.java:508)

at com.inductiveautomation.ignition.common.execution.impl.BasicExecutionEngine$TrackedTask.run(BasicExecutionEngine.java:582)

at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)

at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.base/java.lang.Thread.run(Unknown Source)

Caused by: org.mariadb.jdbc.internal.util.exceptions.MariaDbSqlException: Connection timed out

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.exceptionWithQuery(AbstractQueryProtocol.java:192)

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.exceptionWithQuery(AbstractQueryProtocol.java:177)

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:321)

at org.mariadb.jdbc.ClientSidePreparedStatement.executeInternal(ClientSidePreparedStatement.java:220)

... 14 common frames omitted

Caused by: java.sql.SQLNonTransientConnectionException: Read timed out

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.handleIoException(AbstractQueryProtocol.java:2035)

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1520)

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1499)

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:318)

... 15 common frames omitted

Caused by: java.net.SocketTimeoutException: Read timed out

at java.base/java.net.SocketInputStream.socketRead0(Native Method)

at java.base/java.net.SocketInputStream.socketRead(Unknown Source)

at java.base/java.net.SocketInputStream.read(Unknown Source)

at java.base/java.net.SocketInputStream.read(Unknown Source)

at java.base/java.io.FilterInputStream.read(Unknown Source)

at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.fillBuffer(ReadAheadBufferedStream.java:131)

at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.read(ReadAheadBufferedStream.java:104)

at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:247)

at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:218)

at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1518)

richardNZ15 · June 12, 2022, 12:47am

I have tried the following which didn’t help:

Disable use of S&F system, still get same errors
Updated “connect_timeout” variable in SQL database servers themselves from 5-100s. No change.

pturmel · June 12, 2022, 12:45pm

Consider turning on statement logging on the DB side to obtain the actual statements that are causing the problem. Then run them with “Explain” in your DB’s front end to figure out why they are choking. I suspect there’s some time-consuming action that isn’t obvious from the Ignition side.

richardNZ15 · June 13, 2022, 1:13am

I suspect it is caused by this query:

I’ve run the EXPLAIN SELECT statement on the Subquery that is doing the SELECT of ‘ids’, and it is definitely using my clustered index CLIX_AlarmEvents_EEP to do the query.

Running the query itself is very fast:

SELECT id FROM AlarmEvents WHERE AlarmEvents.eventtime < @pruneDate;
/* Affected rows: 0  Found rows: 16,054  Warnings: 0  Duration for 3 queries: 0.126 sec. (+ 0.218 sec. network) */

Which means the DELETE is the query taking ages, which isn’t surprising.
I have got an Index on the AlarmEventsData table for the ‘id’ column. So i would have expected deletes purely based on ‘id’ to be fine.

Table CREATE code below:

CREATE TABLE `AlarmEventData` (
	`id` INT(11) NULL DEFAULT NULL,
	`propname` VARCHAR(500) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`dtype` INT(11) NULL DEFAULT NULL,
	`intvalue` BIGINT(20) NULL DEFAULT NULL,
	`floatvalue` DOUBLE NULL DEFAULT NULL,
	`strvalue` VARCHAR(5000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	INDEX `AlarmEventDataidndx` (`id`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
ENGINE=TokuDB
;

It is curious that the EXPLAIN for the DELETE statement shows no use of a key for the AlarmEventData table, unless i’m interpreting the output wrong… Seems like it has to loop through all 26007244 rows to do the deletion…

richardNZ15 · June 13, 2022, 1:15am

Would it be better to create a PRIMARY KEY for the ‘id’ column on AlarmEventsData table?
That’s the only way i can think of getting the query manager to use an index. I presume there would be no duplicate values for ‘id’, but not sure if Ignition would complain about the auto incrementing nature of the Primary Key.

richardNZ15 · June 13, 2022, 1:21am

Could be related to this. May need to split DELETE into smaller chunks…
Seems like something IA should do if possible.
But i don’t really know if its possible to know what the SQL Optimizer is going to do in advance.

richardNZ15 · June 13, 2022, 2:34am

DELETE FROM AlarmEventData WHERE id IN (SELECT id FROM AlarmEvents WHERE AlarmEvents.eventtime <= @pruneDate);
/* Affected rows: 0  Found rows: 1  Warnings: 0  Duration for 3 queries: 00:02:10.8 */

Execution worked when running directly on the master SQL Server (HeidiSQL GUI)
But now i’m a little concerned because the replication Slave servers don’t seem to have performed the deletion.

This thread seems to imply that it would be better to split the delete into smaller chunks for two reasons:

Ensure the index is used by the query optimizer
Give time for the replication slaves to deal with the smaller delete chunk, and any other replication tasks.

Unfortunately it seems that my current version of MariaDB doesn’t allow use of LIMIT in a subquery.
So it seems we’d have to almost do the Alarm Journal pruning ourselves, instead of getting IA to do it in the background. If we wanted to be able to have full control over it.