EDIT: This is Ignition EDGE. So no databases can be involved. Otherwise I’d just write to that and be done with it.
I’m doing some stuff with scripting and a dataset tag. I do have pruning in my scripting, but it’s based on a timestamp (everything older than X days ago gets deleted). But if I have a particularly “busy” site where lots of events happen, I’m concerned about having datasets that are so big they affect performance, considering datasets are immutable and when they’re modified they have to be pulled in in their entirety and then recreated with the new data before they’re written back out. Not necessarily a huge deal when there’s a couple hundred rows. But I’m sure at some point (thousands of rows? tens of thousands? millions?) that things could potentially become an issue.
I’m pretty sure I’m going to have to impose some kind of limit on the number of records in these dataset tags, but I don’t really know what that should be, or even where to start. Initially I’m thinking something like, 10,000 rows. Whenever the dataset hits that, the oldest records get pruned off regardless of age. Is that a reasonable limit? Can I go more? Should I not go so high?
Update: I’m running some simulation logic in the PLC project I have that pushes events very frequently. I’m up to almost 1000 rows in the dataset. So far, no increase in my script execution time (ranges from almost 0ms to around 250ms).
I've got a dataset tag that I limit to 10,000 rows. Hasn't effected performance so far. There is a point where it DOES start to slow things down, but it was quite a bit higher - I just decided 10,000 was more than I should EVER need for this particular use.
Event history data from the PLC. There’s a trigger tag in the PLC I have a Gateway tag change script pointing at, that launches a “get events” script that reads in the dataset for the latest timestamp, goes to the PLC array, does direct OPC reads one by one until it hits old or empty data, and then pulls whatever it found into the dataset. I’m only hitting the PLC for a few tags at a time (I don’t control the PLC program and they have a huge array in it they’re using as a log), but the downside of that is my dataset gets bigger and bigger over time. Like I said, I did put pruning on it by date, but I don’t feel good about not having row count pruning as well.
Answering purely the question at hand, from the seat of my pants with no real testing to back me up:
I would say 10^5 is about the maximum order of magnitude I'd ever put into a dataset, less if it is many columns wide. I definitely wouldn't push it more than that.
Note that you're almost certainly going to have a bad time here down the line. You're risking data races from both the tag system and the actual dataset operations.
Interestingly, this is actually a use case that we have for Datasets, not being able to use something like SQLite on edge for maintaining a small list of barcodes with destinations on an edge screen makes you do some hacky things with a dataset instead.
Well, no. I had a customer that wanted a solution on a Pasteuriser, with reporting and >35 days history.
A single Vision, single report, no Historian, no SQL Bridge, and some typing on the keyboard by myself to get this functionality via scripting wasn’t that much dearer than Edge panel.
I did some testing over night by forcing the PLC to trigger a ton of script executions and data manipulations. When the dataset has a few hundred records, script execution time is anywhere from just over 0ms to around 250ms, depending on how much it has to grab. When I came in this morning, the dataset had over 12,000 rows and counting, and script execution was varying from 130ms to around 500ms. I ended up setting the row cap at 10,000 in my pruning code. Any row indexes over 9999 get deleted. I sort the dataset first to ensure that the oldest data is always getting deleted. I’m comfortable with 500ms execution times. The script isn’t triggered more than once every few seconds. This is essentially a “data table” in the PLC (which, yeah, not great, but I’m not in control of that), so the data doesn’t disappear and it’s not super time-sensitive. I could literally only grab new values once every 30 seconds and that would probably be acceptable. Also, CPU and Ram usage on the edge gateway were within acceptable levels. I could probably push it harder, but a table with 10,000 rows in a perspective session is already pretty unwieldy and impractical, and my customer agrees.
So yeah, 10^5 is exactly where I decided to put my limit.
Base Ignition platform with no visualization is $1,200
Customer uses Perspective and Edge gives you up to two sessions. Standard Ignition that is $4,370
Customer is using one-way email notifications included with Edge which is extra in Standard Ignition: $2,245
Customer uses the WebDev module to display PDFs on the screen. Included in Edge, extra in Standard: $2,300
They also have a few other add-ins like MQTT Transmitter, etc. But I priced out what it would take for them to have Standard Ignition with all the functions they have in Edge and it came to over $13,000
When weighing pros / cons with customers, I regularly end up with this same result. There really is no apple-apple comparison. To each their own...
Thanks for sharing your findings with your test. I recall performing a similar test and vaguely recall an uneasy feeling (nothing significant) around the 10k mark. I ended up with a 1k limit.