Historian tag entry duplicates

Hello Folks,

I come to you expecting the answer, but I should ask anyway.

First, some context:

  • I'm historizing tags manually with system.tag.storeTagHistory. NONE of them have history enabled in their configuration.
  • Tags can be renamed and moved around through a webdev API.
  • I make sure their history paths (in sqlth_te) are updated to reflect these changes
  • All tags are memory and their values may also come through that API.

While moving and renaming tags works fine, If I send a value to a tag after it has been renamed or moved, a duplicate entry is created in sqlth_te.
My guess is there's an internal cache that maps tags paths to tags ids, and this cache is not updated when I change tags paths with a script. Then, when a value is received with the new path, that path doesn't exist in the cache and a new entry is created, which results in a duplicate entry.

So, here's the question: How can I handle that ?
If my guess is right, can I update the cache myself, can I ask the historian to force a refresh of its cache, can I disable it entirely and at what cost ?
If I'm wrong, what causes this and how do I fix it ?

Or am I doomed to write some code to scan the historian tables and merge duplicates on a schedule ?

edit:
I tried sending a value after disabling and restarting the historian, and it didn't create a new entry.
I believe this would confirm the cache hypothesis.

I absolutely would expect the historian to cache metadata, and to not expect outside entities to muck around with its permanent storage.

Seems to me that your manual historization should use your own tables.

Yes, that's what I'm starting to consider.
I wanted to take advantage of the historian's partitioning...

Does the duplicate entry actually break any trends if you update the old "retired" tagpath to match the new one? If the retired and created timestamps match up correctly and the tagpaths are the same, then I believe the historian trending functions should handle 2 different entries.

I've heard good things about GitHub - timescale/timescaledb: A time-series database for high-performance real-time analytics packaged as a Postgres extension

I don't recommend this approach for anything in production that you plan to support or maintain, because as already stated it is unsupported and could break at any time.

However... If you really need the historian to update its cache, you can just restart the historian. As in, edit the config on the gateway and save it again. If you watch the logs, you'll see these entries:
image
which should make it use what you updated in the table.

Or if you want an even less stable solution, you could automate it with this:

But again, not something I'd put in production...

The "old" and "new" have the exact same tagpath. That's my issue.

I have a tag at path "folder1/foo".
I store a value for it, so an entry is created in sqlth_te.
Now, I send a request to the API I made to change its path. Let's make it "folder2/foo".
The endpoint does a bunch of things, including changing the tag path in sqlth_te. It is now "folder2/foo".
Now I send a new value to store for this tag. The path I use matches the path in the historian.
But because it was cached under a different path, we get a cache miss, and a new entry is created, with the same path: "folder2/foo".
Now I have 2 entries in sqlth_te with that same path.

There is no retired timestamp. The initial tag was never retired.

This whole thing doesn't happen if i restart the historian, I've tried that in an effort to confirm that it's a cache issue.
But all of this needs to be done programmatically, and might happen several times per second.

I am not shocked that your own manipulation of the metadata is simply not reflected in the historian. :man_shrugging:

Yes I'm not surprised either, but I hoped that one of you would have a magical solution.

What if you manually retire an entry whenever you change its path? Expect the cache miss and let the historian create a new entry. From my testing, trending should still work. Are the duplicate entries actually a problem that you have to solve?

I understand the downside, that every time you'd "rename" a tag you'd get an additional entry in sqlth_te, but unless you expect to rename the same tag 100s of times and the trending works, is it a big deal?

I also retrieve data manually.
I guess I could use that in my queries, but it seems to add another layer of complexity.

I think the sane option is the one Phil suggested in his first answer, but I have yet to bounce that on the deciding peoples, and I’m not sure it’ll fly.

If you're also retrieving data manually then I would completely agree with Phil and just implement the historian yourself.

I don't know about timescaledb but in terms of timeseries databases I have used QuestDB, and the partitioning of a table is all handled automatically in the background by the database. You create a table, say that you want it partitioned by some period of time and to you (as the user) it will just look like one table even though its partitioned in the background. It's very easy. You can also just use SQL to interact with it even though its not a relational database. QuestDB is opensource and is actually the database that IA's Power Historian in 8.3 is built on top of.

A downside of QuestDB is that at the moment the data is not compressed (like timescaledb) so it's not as efficient space wise. Though I heard they're working on remedying that later this year with complete Parquet support...

1 Like

Hey Daniel! QuestDB Developer Advocate here.

For compression, something you can do is mounting a ZFS filesystem with compression enabled and move your questdb root folder there. We've seen some significant compression ratios with performance being equivalent or better, as QuestDB needs to read less data from disk.

More info at Enable compression with ZFS | QuestDB

Other than that, you are right that we will be releasing native parquet support later this year :slight_smile:

1 Like