Dictionary value sum / avg?

Hello.
In "Dictionary", is it possible to find the sum of the average "B" values ​​of "A" values ​​for the same date and create one for each date?

ex)
a = [ { "Bdata": 2826.5, "time": "2023-09-15", "Adata": 51.5 },
{ "Bdata": 29629.0, "time": "2023-09-18", "Adata": 50.8 },
{ "Bdata": 346.8, "time": "2023-09-19", "Adata": 32.8 },
{ "Bdata": 7274.9, "time": "2023-09-15", "Adata": 50.9 },
{ "Bdata": 29630.3, "time": "2023-09-18", "Adata": 50.7 },
{ "Bdata": 334.9, "time": "2023-09-19", "Adata": 40.4 },
{ "Bdata": null, "time": "2023-09-15", "Adata": null },
{ "Bdata": 4247.0, "time": "2023-09-18", "Adata": 50.3 },
{ "Bdata": 351.3, "time": "2023-09-19", "Adata": 54.8 } ]

===>>

[ { Bdata: sum,||time: 2023-09-15, Adata: average },
{ Bdata: sum,||time: 2023-09-18, Adata: average },
{ Bdata: sum,||time: 2023-09-19, Adata: average } ]

Please tell me how to change this!!

Tanks

Create another dictionary for the purpose of storing the sums of a values, b values and the count of records by the date. Iterate through your original list of dictionaries to fill the new dictionary accordingly. Finally, iterate through all the keys in the new dictionary to create a new list of dictionaries that has the date, average of 'A' values and sumof 'B' values.

quick example script that assumes you want to skip null values:

def packMyDictList(dictList):

	dateAccumulator = {}
	for item in dictList:
		dateKey = item['time']
		keyValue = dateAccumulator.setdefault(dateKey, {
			"aValuesAccum": 0,
			"bValuesAccum": 0,
			"aValuesCount": 0
		})

		aDataValue = item['Adata']
		if aDataValue is not None:
			keyValue["aValuesAccum"] += aDataValue
			keyValue["aValuesCount"] += 1

		bDataValue = item['Bdata']
		if bDataValue is not None:
			keyValue["bValuesAccum"] += bDataValue

		dateAccumulator.update({dateKey: keyValue})

	newAveragedAndSummedList = [
		{
			"time": key,
			"BDataSum": dateAccumulator[key]["bValuesAccum"],
			"ADataAvg": dateAccumulator[key]["aValuesAccum"]/dateAccumulator[key]["aValuesCount"]
		} for key in dateAccumulator
	]

	return newAveragedAndSummedList

which outputs the following when entering your example data:

[{'ADataAvg': 42.666666666666664, 'BDataSum': 1033.0, 'time': '2023-09-19'}
{'ADataAvg': 50.6, 'BDataSum': 63506.3, 'time': '2023-09-18'}
{'ADataAvg': 51.2, 'BDataSum': 10101.4, 'time': '2023-09-15'}]

this can be replaced with simply:
dateAccumulator[key]["aValuesAccum"]/len(dateAccumulator[key]["aValuesAccum"])

How does that work if dateAccumulator[key]["aValuesAccum"] is a float/int?

use this then:
float(dateAccumulator[key]["aValuesAccum"])/len(dateAccumulator[key]["aValuesAccum"])

a float/int = float

Ah no, I meant float/int as in the value might be a float type or an int type depending on what's fed. That's my bad for trying to save some typing.

I'm more confused how len(dateAccumulator[key]["aValuesAccum"]) gives you the same value as dateAccumulator[key]["aValuesCount"] should have. You aren't supposed to be able to call len() on an int or float

Oops, my bad!

float(dateAccumulator[key]["aValuesAccum"]) / len(dateAccumulator[key])

I'm still not getting how this works, I'm not sure if it's because its EOD for me or I'm missing something. Can you explain with crayons/paper? :laughing:

I think my system is out of coffee too... you're using an accum not a list of values, so no... this won't work... Let's start again :weary:

This is how I would do it:

a = [ { "Bdata": 2826.5, "time": "2023-09-15", "Adata": 51.5 },
{ "Bdata": 29629.0, "time": "2023-09-18", "Adata": 50.8 },
{ "Bdata": 346.8, "time": "2023-09-19", "Adata": 32.8 },
{ "Bdata": 7274.9, "time": "2023-09-15", "Adata": 50.9 },
{ "Bdata": 29630.3, "time": "2023-09-18", "Adata": 50.7 },
{ "Bdata": 334.9, "time": "2023-09-19", "Adata": 40.4 },
{ "Bdata": None, "time": "2023-09-15", "Adata": None },
{ "Bdata": 4247.0, "time": "2023-09-18", "Adata": 50.3 },
{ "Bdata": 351.3, "time": "2023-09-19", "Adata": 54.8 } ]

grouped_data = {}
for item in a:
	time = item['time']
	grouped_data.setdefault(time, {'Adata': [], 'Bdata': []})
	if item['Adata'] is not None and item['Bdata'] is not None:
		grouped_data[time]['Adata'].append(item['Adata'])
		grouped_data[time]['Bdata'].append(item['Bdata'])

avg_data = []
for time, data in grouped_data.items():
	avg_a = sum(data['Adata'])/len(data['Adata'])
	sum_b = sum(data['Bdata'])
	avg_data.append({'time': time, 'Adata': avg_a, 'Bdata': sum_b})

Ah excellent, I'm not completely stupid then :upside_down_face:

Any reason for the preference of a list of values instead of a value accumulator?

Not particularly, I just prefer to keep the raw values for as long as possible :man_shrugging: maybe more calculations need to be done to them in the future (std. deviation, min, max, etc.). Keeping raw values lets you do that

1 Like

This is my take on the script that you and @ryan.white have provided.

sorted() and itertools.groupby() are used to group the data by the 'time' key.

from itertools import groupby

listIn = [ { "Bdata": 2826.5, "time": "2023-09-15", "Adata": 51.5 },
{ "Bdata": 29629.0, "time": "2023-09-18", "Adata": 50.8 },
{ "Bdata": 346.8, "time": "2023-09-19", "Adata": 32.8 },
{ "Bdata": 7274.9, "time": "2023-09-15", "Adata": 50.9 },
{ "Bdata": 29630.3, "time": "2023-09-18", "Adata": 50.7 },
{ "Bdata": 334.9, "time": "2023-09-19", "Adata": 40.4 },
{ "Bdata": None, "time": "2023-09-15", "Adata": None },
{ "Bdata": 4247.0, "time": "2023-09-18", "Adata": 50.3 },
{ "Bdata": 351.3, "time": "2023-09-19", "Adata": 54.8 } ]

groups = {k:list(g) for k,g in groupby(sorted(listIn,key=lambda l:l['time']),lambda l:l['time'])}

results = [{
			'AdataAvg':sum(i['Adata'] for i in g if i['Adata']) / len(g),
			'BdataSum':sum(i['Bdata'] for i in g if i['Bdata']),
			'time': time
			}
			for time,g in groups.iteritems()
		]

Output:

>>>
[{'BdataSum': 1033.0, 'AdataAvg': 42.666666666666664, 'time': '2023-09-19'},
{'BdataSum': 63506.3, 'AdataAvg': 50.6, 'time': '2023-09-18'},
{'BdataSum': 10101.4, 'AdataAvg': 34.13333333333333, 'time': '2023-09-15'}]
>>> 

Could also easily throw a round() in there if you don't want a crazy number of decimals.

1 Like

I knew there was going to be a nicer way to do the grouping! good to know
Still waiting for @pgriffith to provide a one-liner though haha

1 Like

This is already basically a one liner.

Replace groups in the comprehension with the expression it was assigned, remove whitespaces: tada !

Doesn’t work because 'grouby' returns a key:generator pair, and generators don’t have a 'len' so you get a divide by zero error.

That's not the real issue.
If you replace groups (and ONLY groups) by the expression, it does work:

for time,g in {k:list(g) for k,g in groupby(sorted(listIn,key=lambda l:l['time']),lambda l:l['time'])}.iteritems()

Or you could calculate the means without using len:

return reduce(lambda x, (i, y): float(x*i + (y['Adata'] or 0)) / (i+1), enumerate(g), 0)

But the REAL issue, if you want to use the generators directly, is that you're using the same one twice.
Which doesn't work:

gen = (i for i in xrange(10))
a = sum(gen)
b = sum(gen)

a, b

(45, 0)

One should not reply prior to their morning caffeine injection.

If the source data is in a Perspective property, I'd use my groupBy() expression function. Like so:

forEach(
	groupBy(
		{path.to.source.property},
		it()['time'] // it() in this line is the groupBy()
	),
	asMap(
		'groupedTime', it()[0], // it() here is the outer forEach()
		'sumBdata', sum(
			forEach(
				it()[1], // it() here is the outer forEach()
				it()['Bdata'] // it() here is the forEach() inside the sum()
			)
		),
		'avgAdata', mean(
			forEach(
				it()[1], // it() here is the outer forEach()
				it()['Adata']// it() here is the forEach() inside the mean()
			)
		)
	)
)

Line breaks are optional in expression functions, so this could be a one-liner. :grin:

2 Likes

Yes... duly noted!

@pturmel
Much more betterer!

forEach(groupBy({path.to.source.property},it()['time']//it()inthislineisthegroupBy()),asMap('groupedTime',it()[0],//it()hereistheouterforEach()'sumBdata',sum(forEach(it()[1],//it()hereistheouterforEach()it()['Bdata']//it()hereistheforEach()insidethesum())),'avgAdata',mean(forEach(it()[1],//it()hereistheouterforEach()it()['Adata']//it()hereistheforEach()insidethemean()))))
1 Like

I think you have to ditch the comments when you squash it down to one line.