Alert storage profile error

diat150 · March 22, 2012, 10:28pm

It looks like I am getting some deadlocks in my alert storage profile table. I have alot of devices going in and out of alarm every few seconds. Have any suggestions to correct this issue? I am using sql server 2008 and Ignition version
7.2.11-beta1 (b209)

INFO   | jvm 1    | 2012/03/20 21:11:17 | WARN  [StorageManager                ] [21:11:17,271]: Alert storage profile "New_Alert_Storage" threw an exception on receive.
INFO   | jvm 1    | 2012/03/20 21:11:17 | com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 54) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:196)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1454)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:388)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:338)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:185)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:160)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:306)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.gateway.datasource.SRConnectionWrapper$SRPreparedStatement.executeUpdate(SRConnectionWrapper.java:810)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.gateway.datasource.SRConnectionWrapper.runPrepUpdate(SRConnectionWrapper.java:139)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.gateway.alert.storage.DatasouceAlertStorageProfile.cleared(DatasouceAlertStorageProfile.java:405)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.gateway.alert.storage.DatasouceAlertStorageProfile.receiveAlert(DatasouceAlertStorageProfile.java:373)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.gateway.alert.storage.AlertStorageManagerImpl.receiveAlert(AlertStorageManagerImpl.java:308)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.gateway.alert.AlertBusImpl$Publisher.run(AlertBusImpl.java:88)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at com.inductiveautomation.ignition.common.execution.impl.BasicExecutionEngine$ThrowableCatchingRunnable.run(BasicExecutionEngine.java:526)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.FutureTask.run(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
INFO   | jvm 1    | 2012/03/20 21:11:17 | 	at java.lang.Thread.run(Unknown Source)

diat150 · March 28, 2012, 12:50am

bump

Colby.Clegg · March 28, 2012, 4:04pm

Hi,

Probably the best thing to do would be to try to narrow down exactly why the deadlock is occurring. SQL Server has some pretty decent tools for doing this. Although I’ve used them before, it’s been a while, and I can’t guide you directly. Check out this link, and then maybe look at similar articles for the version of SQL Server that you’re using: http://msdn.microsoft.com/en-us/library/ms188246.aspx

In this case, I suspect the problem is probably caused by the indexes on the alert log table, and the way that the queries on the system are using them. On that table, there’s the primary key (id), and then separate indexes on clear time and active time.

The insert operation would need to lock all of the indexes, and then a select would lock different different indexes based on which columns were used in the where clause. If the operations overlapped, it could possibly deadlock. You should also check the table in SQL Server Manager to see if any further indexes have been defined.

Anyhow, getting more specific information about the deadlock would help, and maybe we can figure out which indexes to adjust to eliminate the problem.

Regards,

diat150 · March 29, 2012, 1:09pm

as far as the indexes, they are as you described and have had no modification. regarding the select statements on the alert log, this is very rarely used in our system, unless something internal to ignition is querying the alert log table. Plus the deadlocks happen in the early morning hours when no users are logged on. I have enclosed the deadlock trace that shows a few deadlocks.

I have about 1700 devices, each with 7-8 alarm points enabled, updating once per hour. Is it possible that the system is being overwhelmed?

I changed the .trc to .txt. so it will need to be changed back to .trc to open in sql management studio.
test deadlock.txt (445 KB)

diat150 · April 2, 2012, 6:31pm

Colby, have you had a chance to take a look at this?

Colby.Clegg · April 2, 2012, 10:33pm

Hey, sorry for the delay, I’m trying to get the profiler installed right now so I can take a look.

Does this only happen at night? If so, could it be possible that a DB backup is happening that locks the table for a bit?

Regards,

diat150 · April 2, 2012, 11:28pm

no, it happens a few times per hour from what I can see.

Colby.Clegg · April 3, 2012, 12:16am

On that table, are any of the indexes set as “clustered”? In particular, is the “clear_time” index clustered? If so, change it to be non-clustered. Clustered indexes cause the rows to be stored physically according to the index, which makes locking take a longer when the column needs to be updated. If none are clustered, let me know.

Regards,

diat150 · April 3, 2012, 1:09am

the only clustered index is the alert_log_ndx column. the other two, active_timestamp and cleared_timestamp, are non clustered.

Travis.Cox · April 3, 2012, 3:14pm

I would try making them all non-clustered to see if that helps.

Colby.Clegg · April 3, 2012, 5:00pm

I don’t think that making the primary key index non-clustered will help. While I haven’t been successful in getting that deadlock trace opened graphically, from looking at it in a text editor, it appears everything centers around the query to record alert clears. I suspect that that query isn’t executing very quickly, and if we can improve it, the deadlocks will go away.

If you have internet access on that machine, we can speed this along with GoToMeeting. Otherwise, what I would suggest is running the “display query plan” option on a sample “alert clear” query, seeing what it reports, and the creating a new index to help it.

The query to clear alerts is basically:

UPDATE alert_log SET cleared_value=? and cleared_timestamp=? WHERE system=? AND path=? AND state_name=? AND cleared_timestamp IS NULL

So, the hope is that the index on cleared_timestamp will be used to find just the null records (which should be relatively few), and then only those will be scanned for the other conditions. However, that may not be happening here. Try running the “display execution plan” on:

UPDATE alert_log SET cleared_value=0 and cleared_timestamp=CURRENT_TIMESTAMP WHERE system='test' AND path='test' AND state_name='test' AND cleared_timestamp IS NULL

If possible, take a screen shot of the results and post it. Otherwise, try to describe the results. Does it mention an index name, or does it just say “full scan”?

Regards,

diat150 · April 4, 2012, 12:33am

we have a busy few days, so it will be next week before we can check it out. Id prefer to let one of you look at it via gotomeeting if possible. should I ask for you or just get with one of the other support guys?

Travis.Cox · April 4, 2012, 3:15pm

Just get one of the support guys and have them look at the post. They will know what to do.