SQL deadlocks on accessing sqlth_sce

We’re expriencing quite some DB errors due to deadlocks in one of our projects (see screenshots below).

After executing DBCC TRACEON (1204, -1) on the SQL server, to trace the deadlocks, I was able to get a log from it: deadlock.log (11.5 KB)

It appears that a read and a write to the sqlth_sce table are causing the issue. But I don’t get when or why the system will use that table. What can we investigate to find the root cause of these errors?

This system is running 7.9.11


1 Like

Ok, after some digging, I found the issue.

Apparently, it reads and writes to that table when a scan class doesn’t get updates soon enough.

We could count the number of records per scan class, and saw that most records belonged to the “fast” scan class.

When searching the sqlth_scinfo for tags with the “fast” scan class, we found a few tags that used the scan class as historian scan class. Those appeared to be the problem, as the scan class was fast, it was stale a lot sooner, and caused a lot more new records and updates.

The tags are communication bits, so they don’t change that often, but I understand some people want a quite precise timestamp on when they changed. Those tags are now redefined to evaluate on change, hopefully that’s better.

That said, the sqlth_sce table can use a better design IMO, there’s no clustered index or primary key on the table, so I guess that’s why the select and update queries took longer than expected. I guess the combination of scid and start_time can be used as a primary key?

I’m having a similar problem with lots of deadlock errors caused by the sqlth_sce table. There are over 100,000 records in my table. This seems a bit much. Why does it create new records? How do I limit the number of new records it creates?

Also, I’m thinking of pruning it to allow it to run more efficielty. Can anyone offer advice on the best way to do this?

After the deadlocks were solved, I took a new look.

Apparently the sqlth_sce table gets updated periodically (716 times in the hour these queries were recorded).
But it shouldn’t create new records in most cases.

New records are only created when a historian tag goes stale. So to avoid this, you should make sure the timing on historian tags is forgiving enough for your network and hardware.The default of 30s seems to work well enough for us (on a local network, with 10 devices connected and 5000 tags on the historian). But the issues above were caused by having tags on the “fast” scan class.

However, even with it not creating excessive records (there are currently 203 records in that table), the queries still take a lot of time. That table is used for inserting into the historian table (sqlt_data_*), so the lack of index certainly matters.

As the records are inserted by start_time, and the most heavy queries using the sqlth_sce table filter or sort on start_time, I thought adding a clustered index on that column wouldn’t hurt.

Let’s see what’s it doing next, it certainly looks promising with the average duration dropping from 15 ms to 1.6 ms.

2 Likes

After adding a clustered index, it looks like that update query completely disappeared from the heavy queries.

However, now there’s one query that clearly uses the most CPU time. It’s not taking exessively long (1.54 ms on average), but it just gets executed a lot (19470 times in the last hour).

It seems to be querying if the historian provider was up on a certain time or time range.
But I wonder why it’s being executed that often? We only have a few trends visible by default.

Why does it have to run that often? Apart from the most recent record, the data should be static.