Dictionary value sum / avg?

JinGeun_Kim · September 18, 2023, 11:39pm

Hello.
In "Dictionary", is it possible to find the sum of the average "B" values of "A" values for the same date and create one for each date?

ex)
a = [ { "Bdata": 2826.5, "time": "2023-09-15", "Adata": 51.5 },
{ "Bdata": 29629.0, "time": "2023-09-18", "Adata": 50.8 },
{ "Bdata": 346.8, "time": "2023-09-19", "Adata": 32.8 },
{ "Bdata": 7274.9, "time": "2023-09-15", "Adata": 50.9 },
{ "Bdata": 29630.3, "time": "2023-09-18", "Adata": 50.7 },
{ "Bdata": 334.9, "time": "2023-09-19", "Adata": 40.4 },
{ "Bdata": null, "time": "2023-09-15", "Adata": null },
{ "Bdata": 4247.0, "time": "2023-09-18", "Adata": 50.3 },
{ "Bdata": 351.3, "time": "2023-09-19", "Adata": 54.8 } ]

===>>

[ { Bdata: sum,||time: 2023-09-15, Adata: average },
{ Bdata: sum,||time: 2023-09-18, Adata: average },
{ Bdata: sum,||time: 2023-09-19, Adata: average } ]

Please tell me how to change this!!

Tanks

ryan.white · September 19, 2023, 12:43am

Create another dictionary for the purpose of storing the sums of a values, b values and the count of records by the date. Iterate through your original list of dictionaries to fill the new dictionary accordingly. Finally, iterate through all the keys in the new dictionary to create a new list of dictionaries that has the date, average of 'A' values and sumof 'B' values.

quick example script that assumes you want to skip null values:

def packMyDictList(dictList):

	dateAccumulator = {}
	for item in dictList:
		dateKey = item['time']
		keyValue = dateAccumulator.setdefault(dateKey, {
			"aValuesAccum": 0,
			"bValuesAccum": 0,
			"aValuesCount": 0
		})

		aDataValue = item['Adata']
		if aDataValue is not None:
			keyValue["aValuesAccum"] += aDataValue
			keyValue["aValuesCount"] += 1

		bDataValue = item['Bdata']
		if bDataValue is not None:
			keyValue["bValuesAccum"] += bDataValue

		dateAccumulator.update({dateKey: keyValue})

	newAveragedAndSummedList = [
		{
			"time": key,
			"BDataSum": dateAccumulator[key]["bValuesAccum"],
			"ADataAvg": dateAccumulator[key]["aValuesAccum"]/dateAccumulator[key]["aValuesCount"]
		} for key in dateAccumulator
	]

	return newAveragedAndSummedList

which outputs the following when entering your example data:

[{'ADataAvg': 42.666666666666664, 'BDataSum': 1033.0, 'time': '2023-09-19'}
{'ADataAvg': 50.6, 'BDataSum': 63506.3, 'time': '2023-09-18'}
{'ADataAvg': 51.2, 'BDataSum': 10101.4, 'time': '2023-09-15'}]

nminchin · September 19, 2023, 12:47am

this can be replaced with simply:
dateAccumulator[key]["aValuesAccum"]/len(dateAccumulator[key]["aValuesAccum"])

ryan.white · September 19, 2023, 12:51am

How does that work if dateAccumulator[key]["aValuesAccum"] is a float/int?

nminchin · September 19, 2023, 12:52am

use this then:
float(dateAccumulator[key]["aValuesAccum"])/len(dateAccumulator[key]["aValuesAccum"])

a float/int = float

ryan.white · September 19, 2023, 12:54am

Ah no, I meant float/int as in the value might be a float type or an int type depending on what's fed. That's my bad for trying to save some typing.

I'm more confused how len(dateAccumulator[key]["aValuesAccum"]) gives you the same value as dateAccumulator[key]["aValuesCount"] should have. You aren't supposed to be able to call len() on an int or float

nminchin · September 19, 2023, 1:02am

Oops, my bad!

float(dateAccumulator[key]["aValuesAccum"]) / len(dateAccumulator[key])

ryan.white · September 19, 2023, 1:06am

I'm still not getting how this works, I'm not sure if it's because its EOD for me or I'm missing something. Can you explain with crayons/paper?

nminchin · September 19, 2023, 1:15am

I think my system is out of coffee too... you're using an accum not a list of values, so no... this won't work... Let's start again

This is how I would do it:

a = [ { "Bdata": 2826.5, "time": "2023-09-15", "Adata": 51.5 },
{ "Bdata": 29629.0, "time": "2023-09-18", "Adata": 50.8 },
{ "Bdata": 346.8, "time": "2023-09-19", "Adata": 32.8 },
{ "Bdata": 7274.9, "time": "2023-09-15", "Adata": 50.9 },
{ "Bdata": 29630.3, "time": "2023-09-18", "Adata": 50.7 },
{ "Bdata": 334.9, "time": "2023-09-19", "Adata": 40.4 },
{ "Bdata": None, "time": "2023-09-15", "Adata": None },
{ "Bdata": 4247.0, "time": "2023-09-18", "Adata": 50.3 },
{ "Bdata": 351.3, "time": "2023-09-19", "Adata": 54.8 } ]

grouped_data = {}
for item in a:
	time = item['time']
	grouped_data.setdefault(time, {'Adata': [], 'Bdata': []})
	if item['Adata'] is not None and item['Bdata'] is not None:
		grouped_data[time]['Adata'].append(item['Adata'])
		grouped_data[time]['Bdata'].append(item['Bdata'])

avg_data = []
for time, data in grouped_data.items():
	avg_a = sum(data['Adata'])/len(data['Adata'])
	sum_b = sum(data['Bdata'])
	avg_data.append({'time': time, 'Adata': avg_a, 'Bdata': sum_b})

ryan.white · September 19, 2023, 1:22am

Ah excellent, I'm not completely stupid then

Any reason for the preference of a list of values instead of a value accumulator?

nminchin · September 19, 2023, 1:30am

Not particularly, I just prefer to keep the raw values for as long as possible maybe more calculations need to be done to them in the future (std. deviation, min, max, etc.). Keeping raw values lets you do that

lrose · September 19, 2023, 3:32am

This is my take on the script that you and @ryan.white have provided.

sorted() and itertools.groupby() are used to group the data by the 'time' key.

from itertools import groupby

listIn = [ { "Bdata": 2826.5, "time": "2023-09-15", "Adata": 51.5 },
{ "Bdata": 29629.0, "time": "2023-09-18", "Adata": 50.8 },
{ "Bdata": 346.8, "time": "2023-09-19", "Adata": 32.8 },
{ "Bdata": 7274.9, "time": "2023-09-15", "Adata": 50.9 },
{ "Bdata": 29630.3, "time": "2023-09-18", "Adata": 50.7 },
{ "Bdata": 334.9, "time": "2023-09-19", "Adata": 40.4 },
{ "Bdata": None, "time": "2023-09-15", "Adata": None },
{ "Bdata": 4247.0, "time": "2023-09-18", "Adata": 50.3 },
{ "Bdata": 351.3, "time": "2023-09-19", "Adata": 54.8 } ]

groups = {k:list(g) for k,g in groupby(sorted(listIn,key=lambda l:l['time']),lambda l:l['time'])}

results = [{
			'AdataAvg':sum(i['Adata'] for i in g if i['Adata']) / len(g),
			'BdataSum':sum(i['Bdata'] for i in g if i['Bdata']),
			'time': time
			}
			for time,g in groups.iteritems()
		]

Output:

>>>
[{'BdataSum': 1033.0, 'AdataAvg': 42.666666666666664, 'time': '2023-09-19'},
{'BdataSum': 63506.3, 'AdataAvg': 50.6, 'time': '2023-09-18'},
{'BdataSum': 10101.4, 'AdataAvg': 34.13333333333333, 'time': '2023-09-15'}]
>>>

Could also easily throw a round() in there if you don't want a crazy number of decimals.

nminchin · September 19, 2023, 4:03am

I knew there was going to be a nicer way to do the grouping! good to know
Still waiting for @pgriffith to provide a one-liner though haha

pascal.fragnoud · September 19, 2023, 9:41am

This is already basically a one liner.

Replace groups in the comprehension with the expression it was assigned, remove whitespaces: tada !

lrose · September 19, 2023, 10:54am

~~Doesn’t work because 'grouby' returns a key:generator pair, and generators don’t have a 'len' so you get a divide by zero error.~~

pascal.fragnoud · September 19, 2023, 11:17am

That's not the real issue.
If you replace groups (and ONLY groups) by the expression, it does work:

for time,g in {k:list(g) for k,g in groupby(sorted(listIn,key=lambda l:l['time']),lambda l:l['time'])}.iteritems()

Or you could calculate the means without using len:

return reduce(lambda x, (i, y): float(x*i + (y['Adata'] or 0)) / (i+1), enumerate(g), 0)

But the REAL issue, if you want to use the generators directly, is that you're using the same one twice.
Which doesn't work:

gen = (i for i in xrange(10))
a = sum(gen)
b = sum(gen)

a, b

(45, 0)

lrose · September 19, 2023, 11:51am

One should not reply prior to their morning caffeine injection.

pturmel · September 19, 2023, 12:02pm

If the source data is in a Perspective property, I'd use my groupBy() expression function. Like so:

forEach(
	groupBy(
		{path.to.source.property},
		it()['time'] // it() in this line is the groupBy()
	),
	asMap(
		'groupedTime', it()[0], // it() here is the outer forEach()
		'sumBdata', sum(
			forEach(
				it()[1], // it() here is the outer forEach()
				it()['Bdata'] // it() here is the forEach() inside the sum()
			)
		),
		'avgAdata', mean(
			forEach(
				it()[1], // it() here is the outer forEach()
				it()['Adata']// it() here is the forEach() inside the mean()
			)
		)
	)
)

Line breaks are optional in expression functions, so this could be a one-liner.

nminchin · September 20, 2023, 4:31am

Yes... duly noted!

@pturmel
Much more betterer!

forEach(groupBy({path.to.source.property},it()['time']//it()inthislineisthegroupBy()),asMap('groupedTime',it()[0],//it()hereistheouterforEach()'sumBdata',sum(forEach(it()[1],//it()hereistheouterforEach()it()['Bdata']//it()hereistheforEach()insidethesum())),'avgAdata',mean(forEach(it()[1],//it()hereistheouterforEach()it()['Adata']//it()hereistheforEach()insidethemean()))))

pturmel · September 20, 2023, 12:06pm

I think you have to ditch the comments when you squash it down to one line.