Sum of average value in the dataset

I want to sum a data set where inside the dataset, an average value of every 5 min. (I have a dataset that produces every second and I need to get the addition of the average value in every 5min interval)
How can I do this?
Note that: Dataset is timestamped and I am using perspective

Is the dataset ordered by timestamp?

Yes

You need to group by the "truncated" timestamp. I would use my groupBy() expression function (from my free Integration Toolkit), getting the truncated timestamps by converting to millis, taking the modulus of 300000, and subtracting those millis from the original timestamp.

Wrap the grouping with a sum(forEach(mean())). If you post the CSV of a sample dataset, I can work out the expression for you.

How can I find these functions? The usage of 'grounBy()' is not familiar to me.
I am trying to solve this through the script and writing two 'for' loops.

1 Like

This is the formula
image

I need to calculate the summation of every 15min averaged P_t.

This is the dataset and it has 50k entries.

Please paste the dataset as text (use the copy to clipboard icon in the dataset property editor). I'm not going to type all those in.

(And tell us the right answer, computed by hand if necessary, so we can check our code. That formula is not particularly clear.)

1 Like

Here attached the dataset
data.txt (40.3 KB)

It is a huge formula. I do it part by part. So simply, I need to get the summation of 15min averaged P(power) value.

Do you want the averages to start on the 5-minute boundaries or the 15-minute boundaries? (Also, please follow directions on creating CSVs--I can't just paste this into Ignition because it wasn't copied out of Ignition's dataset property editor.)

It is 15min

The grouping part of the expression would look like this:

groupBy(
	{path.to.dataset},
	addMillis(it()['t_stamp'], -(toMillis(it()['t_stamp']) % 900000))
)

This will produce a list of 15-minute groups, in this shape:

[
	[2023-11-15 16:15:00.000, Dataset{nRx2C}],
	[2023-11-15 16:30:00.000, Dataset{nRx2C}],
	[2023-11-15 16:45:00.000, Dataset{nRx2C}],
	[2023-11-15 17:00:00.000, Dataset{nRx2C}],
	....
	[2023-11-15 18:45:00.000, Dataset{nRx2C}]
]

Note that the grouping key is the beginning of the 15-minute period, and the accompanying dataset contains just the rows that fall within that period.

To get all of the averages added together, you nest that inside an iterator that will take the mean of each inner dataset, and present as a list to sum(), like so:

sum(
	forEach(
		groupBy(
			{path.to.dataset},
			addMillis(it()['t_stamp'], -(toMillis(it()['t_stamp']) % 900000))
		),
		mean(it()[1], 'PoC_RealPowerTotal_kW')
	)
)

Untested.

The source dataset must be composed of complete sets of rows for each period. It is up to you to ensure the query binding that gets this data has a start timestamp that is similarly truncated to a 15-minute boundary, and is a multiple of fifteen minutes long. (Less one millisecond on the end timestamp, for ridiculous reasons, if coming from the tag historian.)

forEach(), groupBy(), and it() are functions from my Integration Toolkit. toMillis(), addMillis(), mean(), and sum() are native Ignition expression functions, in the "Aggregates" section.

First Phil's module comes highly recommended, and will be far more performant than doing this in a script, so if you can use his module, then you absolutely should.

That being said, you can achieve this in script as well. I will note that this is not exactly trivial python to the uninitiated, so it will quite possibly look very cryptic. If that is the case, then you will just have to take me at my word that this function indeed returns the sum of the averages of 15 min groups from a dataset.

You may also want to consider running this in an asynchronous thread since it could potentially take a fair amount of time to process (depending on the size of your dataset)

from itertools import count,groupby

#colToGroupBy should be the index of your t_stamp column
def dsGroup(ds,colToGroupBy,msTimeRange):
    c = count()

    def keyfunc(r):
        return system.date.addMillis(ds[r][colToGroupBy],-system.date.toMillis(ds[r][colToGroupBy]) % msTimeRange)

    return (list(g) for _,g in groupby(ds,lambda x: keyfunc(next(c))))

ds = {path.to.dataset} #or a function/value that returns your dataset

print sum(sum(kw for kw,_ in group) / len(group) for group in dsGroup(ds,1,900000))

Is it possible to search a dataset and extract the duration a plant has been in a specific step.

I have two columns, a time stamp column and step number column.
I would like to add up the duration each step has been in.

Could sum (forEach(groupBy())) work?

Richard

The key will be to construct the equivalent of SQL's lead() function to convert adjacent timestamps into deltas. For the last row, you need to supply the period end timestamp as a separate value. Something like this:

forEach(
	groupBy(
		selectStar(
			{path.to.source.dataset},
			asMap('deltaMillis', 'L'),
			millisBetween(
				it()['t_stamp'],
				try(
					{path.to.source.dataset}[idx()+1, 't_stamp'],
					{path.to.period.end.t_stamp}
				)
			)
		),
		it()['step']
	),
	it()[0],
	sum(it()[1], 'deltaMillis')
)

That yields a list of pairs of step, summedStepMillis. You may wrap in unionAll() if you wish a dataset output.

1 Like

Thanks @pturmel so the time stamp needs converted to milliseconds then all the steps have their milliseconds added together as it iterates through the dataset until the end t_stamp is reached.

I am getting an 'Error_Configuration' on the 'try',

SyntaxError:("mismatched input 'try' expecting RPAREN", ('<string>',(,9,5,'\t\t\t\t\ttry(\n'))

Richard

Double check all my parentheses. There must be a mismatch. Or post your actual expression (with proper references) for me to review.

Morning @pturmel

My bad, sorry, your expression was perfectly fine I typed the wrong column name into it.

Thanks very much, it split each step and totaled the milliseconds, perfect :+1:t3:

Richard