Removing duplicate rows from dataset

hachjessed · December 29, 2020, 4:47pm

Beginning with a basic dataset, what would be the simplest way to remove duplicate rows of data?

Input ds:

ID   Value 
01   val1
02   val2
03   val3
01   val1
02   val2
02   val2

Desired output ds:

ID   Value 
01   val1
02   val2
03   val3

chandler · December 29, 2020, 5:28pm

You can use a Python set to remove duplicates for you:

# get a list of your header names
headers = system.dataset.getColumnHeaders(oldDataset)

# convert to pyDataset, so the data is compatible with the set constructor
pyData = system.dataset.toPyDataSet(oldDataset)

# turn the lists into tuples so they are hashable
tuples = map(tuple, pyData)

# create a set from the list of tuples 
pySet = set(tuples)

#convert set back into a list
newData = list(pySet)

#create an Ignition dataset from the headers and data without duplicates
newDataset = system.dataset.toDataSet(headers, newData)