Find duplicates in a dataset and sum their values


How can I sum the values of a dataset having duplicate records?
New York appears twice in the below example. I need to sum “New York” Population (8363710 + 532) and display the row once.

headers = [“City”, “Population”, “Timezone”, “GMTOffset”]
data = []
data.append([“New York”, 8363710, “EST”, -5])
data.append([“Los Angeles”, 3833995, “PST”, -8])
data.append([“Chicago”, 2853114, “CST”, -6])
data.append([“Houston”, 2242193, “CST”, -6])
data.append([“Phoenix”, 1567924, “MST”, -7])
data.append([“New York”, 532, “EST”, -5])

cities = system.dataset.toDataSet(headers, data)

I’m sure there’s a more clever solution out there, but try this:

headers = ["City", "Population", "Timezone", "GMTOffset"]
data = []
data.append(["New York", 8363710, "EST", -5])
data.append(["Los Angeles", 3833995, "PST", -8])
data.append(["Chicago", 2853114, "CST", -6])
data.append(["Houston", 2242193, "CST", -6])
data.append(["Phoenix", 1567924, "MST", -7])
data.append(["New York", 532, "EST", -5])

cities = system.dataset.toDataSet(headers, data)

d = {}

for row in system.dataset.toPyDataSet(cities):
	city = row[0]
	if (city in d):
		population = d[city].pop(0)
		population = population + row[1]
		d[city].insert(0, population)
		d[city] = [row[1], row[2], row[3]]

summed_data = [[key, d[key][0], d[key][1], d[key][2]] for key in d]
summed_cities = system.dataset.toDataSet(headers, data)

print summed_data
1 Like

Thanks a lot.
This works perfectly fine

Hi Kevin,

Please, how can I search through the City column of the dataset and return “Green” value if it contains the state “Chicago”.

Read up on the ConfigureCell function and look at the examples in the code on your table.
Also the University has a course on it

You will basically end up around:

	if colName=="City":
		if value=="Chicago":
			return {'background': "green"}```