[IGN-9825]Complex dataset sorting

justin.brzozoski · May 9, 2024, 3:34pm

I had an issue where I needed to do an unusually complex sort on a dataset, based on parsing substrings from some columns and chained values in other columns. I was expecting it to be a giant slog, and I was pleasantly surprised when all the iterables and datatypes and Python features aligned and my end results was basically this:

def dataset_supersort(dset, key_function):
	"""
	Sort an Ignition dataset using arbitrarily complex sorting rules

	key_function must be a function or lambda as per the description here:
	
	https://docs.python.org/2.7/howto/sorting.html

	The key function will receive a PyRow object as it's parameter.
	
	Note that this also allows the use of comparison functions via cmp_to_key
	"""
	return system.dataset.addRows(system.dataset.clearDataset(dset), sorted(system.dataset.toPyDataSet(dset), key=key_function))

# As an example, let's assume that all your Sparkplug edge nodes have number
# strings for their edge node ID, and you want to sort those numerically while
# ignoring the group IDs...
tpath = '[MQTT Engine]Engine Info/Edge Nodes/Online Nodes'
dset = system.tag.readBlocking([tpath])[0].getValue()
sorted_dset = dataset_supersort(dset, lambda x: int(x['GroupID/EdgeNodeId'].split('/')[1]))

My actual key function was more complex than that lambda, but the one in that example is already something you could never do with the built in dataset sorting.

I'm sure other people have managed this before, so it's not like this is a huge discovery, but I've also never bumped into anyone sharing this method in an example on the forum before. So, maybe this being here will help someone.

PGriffith · May 9, 2024, 4:18pm

Hmm... I think I could get away with overloading system.dataset.sort to allow you to pass a callable as the second argument, allowing it to work just like sorted. That'd be a nice boost in power for the function. Neat idea!

justin.brzozoski · May 20, 2024, 7:15pm

I had a few moments, and decided to see if it was faster to use my method above or to add a decorator key column, sort on that, then remove the key column... And on my first test I bumped into the issue that Ignition datasets can't have tuples as a column datatype. The Python sorted function being able to sort on tuples is a major feature, since it lets you sort multiple columns in order of priority in one call. I couldn't figure out a quick way to imitate that behavior in a key column using a datatype Ignition allows.

When you do get around to this feature, be sure your callable method supports sorting multiple columns by priority, so we can do multi-column sorts with one call to dataset.sort.

I'm still hoping to compare sorting performance using a temporary key column versus my method above, but figured this pitfall was worth noting...

PGriffith · May 20, 2024, 8:08pm

You could use a DatasetBuilder to add each row in the correct output order, to make your construction more explicit.

pturmel · May 20, 2024, 8:13pm

I always use the DatasetBuilder, so I can control the column types explicitly.

justin.brzozoski · May 20, 2024, 8:24pm

You guys misunderstood, I was trying to use a tuple as the datatype in my key sort column, so that I could do a call to dataset.sort on that one column that would effectively process a multi-column sort in one pass.

In Python land, making a key function for sorted that outputs tuples is how you handle multiple columns in one sorting pass. My dataset_supersort supports them, and I already had a complex lambda written for that that I was trying to use as a test for adding a key column.

PGriffith · May 20, 2024, 8:26pm

Right, but you're already doing the sorting yourself.
Instead of calling system.dataset.sort to output your sorted dataset, you use dataset builder to directly construct it based on your sorting output from sorted().

pturmel · May 20, 2024, 8:35pm

This blows up when you want some, but not all, of the keys to sort in reverse order. I made a special function in my Toolkit to use with orderBy() to address this.

justin.brzozoski · May 20, 2024, 8:36pm

I was trying to do a comparison test of dataset_supersort versus this version of a keyed sort function:

def dataset_altsort(dset, key_func):
	if dset.getRowCount() < 2:
		return dset
	init_columns = list(dset.getColumnNames())
	key_col_values = [key_func(x) for x in system.dataset.toPyDataSet(dset)]
	key_type = type(key_col_values[0])
	unsorted_keyed_dset = system.dataset.addColumn(dset, 0, key_col_values, 'temp_key_column', key_type)
	sorted_keyed_dset = system.dataset.sort(unsorted_keyed_dset, 0)
	return system.dataset.filterColumns(sorted_keyed_dset, init_columns)

I wanted to know if there was a major performance difference. By dumb luck, my first test case was using a key_func that returns tuples and only works with dataset_supersort but not dataset_altsort.

justin.brzozoski · May 20, 2024, 8:42pm

There is a way. (I admit I haven't tried this in Jython 2.7, but I see no reason it wouldn't work...)

pturmel · May 20, 2024, 8:48pm

That is precisely what my descending() function does, for Java Comparable<> instances.