Duplicate sqlth_te entries?

I was investigating some weird history query results today and stumbled on this fun tag:

MariaDB [ignition]> SELECT * FROM sqlth_te WHERE tagpath = 'edge nodes/demo/352656102543801/dev1/report count';
+-------+---------------------------------------------------+------+----------+-----------+---------------+---------------+
| id    | tagpath                                           | scid | datatype | querymode | created       | retired       |
+-------+---------------------------------------------------+------+----------+-----------+---------------+---------------+
|  5301 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 | 1587835131426 |          NULL |
| 29434 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |        487656 | 1616766361122 |
| 29439 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |          2481 | 1616770886795 |
| 30028 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |          2415 | 1617970448112 |
| 30037 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |          2402 | 1617970988858 |
| 36046 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |           446 | 1624476523658 |
| 36047 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |        107316 | 1624476729211 |
| 38245 | edge nodes/demo/352656102543801/dev1/report count |    2 |        0 |         0 |          2429 | 1626193664074 |
+-------+---------------------------------------------------+------+----------+-----------+---------------+---------------+
8 rows in set (0.122 sec)

I can explain the really old timestamps. That was our fault, generating malformed SparkplugB payloads.

I can’t explain the multiple entries.

Is this normal? How might this have happened? Does the historian query all of them when I ask for the history of that tag path?

Are you using the storeTagHistory function? When a value is stored though this function, it expects the value’s quality timestamp to be after the “created” timestamp in the database. One trick is to set all created timestamps to zero after the initial creation of a historical tag in the designer and then restart the tag historian in the gateway. This allows someone to use the storeTagHistory function to backdate any value and not to create a new id with the same tagpath. If not using the designer and just writing the tag to history (virtually), I’m not sure how it works, but might want to have the timestamps from oldest to newest.

Yes, as long as the other column’s values are the same (querymode, datatype), it should put them altogether in one query.

One other important thing to look for, but hopefully would be rare. We had a corrupt tag provider a year ago or so (v8.0.XX), and we deleted the old one and restored from backup and renamed it as the old. It didn’t like that and duplicated all tags in the historian so we had duplicate non-retired historical paths. This caused a significant slow performance of our historian. I created a script to clean this up, but took a while. Sometimes not best to mess with things too much, but we learned a lot.

We aren’t using the storeTagHistory function directly, but I suspect the MQTT Engine module does effectively the same thing when it gets a SparkplugB tag with is_historical=True and is setup with “Store Historical Events” enabled.

Now that we’ve fixed bugs in the source of our Sparkplug messages, I expect it to be very unlikely that we will be receiving retroactive timestamps any more.

Your comment about the duplicate non-retired paths worries me a bit though. I suspect we may have some of those since we have been beating the snot out of MQTT Engine with unusual Sparkplug traffic in the past. I may go looking to see if we have any of those. Clean up is just forcefully retiring one of the paths? Probably should be done while the historian module is disabled…

I ran across this post today and decided to do some investigating on my tag historian instance.

I found that about 60% of entries in the sqlth_te were duplicates of currently non-retired tags. It seems the Tag Historian had some issues where it creates and retires some tag paths multiple times, even though there is a continuous historical time frame for this tag configuration.
Most of these duplicate entries had a retired timestamp associated, but about 10% of them were duplicate entries, both with the retired field set to null.

Instead of trying to track down the root cause, I decided to just create a script to consolidate all of these duplicates like @rpavlicek mentioned above.

Code removed because it is cursed

It's admittedly pretty slow for a large number of tags, but it gets the job done. Could probably be made faster by batching some of the queries together.
The script essentially groups all the duplicates and consolidates them with the oldest 'created' field and the newest 'retired' field (Null if it finds any entry that is still Null).

I ended up going from ~425k entries in this table to ~170k. I also found that some of the tags that were most affected had a significant reduction in query time for longer-time period history calls. Had some tags that had 85 duplicate entries and the historical queries went from ~20 seconds to ~2 seconds.

As I said, I'm not super sure what is causing the duplicate entries, but I am content with just cleaning up the table by running this script on a schedule. Does seem like something IA should look into in my opinion, as the performance lost to this issue could be huge, as seen by my results.

I haven't seen any negative side effects of my script at this point, so I believe it is safe enough, but if someone is reading this they probably shouldn't blindly use this script before doing more research on the historian tables themselves.

3 Likes

I recall at one point there being a bug where tags weren’t properly retired.