[Bug] sortDataset Expression Function Odd Alphanumeric Sort Order

WillMT10 · April 21, 2020, 9:57pm

I think this is a bug but I am not certain. In the vision module, the sortDataset() expression handles strings that contain letters and numbers a little odd. It will sort character by character for any letters, which is expected, and I would expect this same behavior for every character in the string, regardless of whether or not it is a letter. However, it appears that, any sequence of numbers within the string are treated as a single number for the character position at the start of the numeric sequence.

Example:
Unsorted dataset (one string type column):

A33Z
A8F3
A1ZZ

Result of sortDataset Expression (ascending argument = True):

A1ZZ
A8F3
A33Z (because 33 > 8?)

I would expect (and the python list.sort() method matches this):

A1ZZ
A33Z
A8F3

I see the same results on 8.0.10 (Linux) and 7.9.12 (Windows).

Kevin.Herron · April 21, 2020, 10:33pm

Looks like it intentionally uses a more “friendly” alphanumeric sorting method when sorting on strings. Starting in… Ignition 7.8.3.

WillMT10 · April 21, 2020, 10:43pm

Ha, perfect description!

I could see where the current behavior would be useful ('PLC_1', 'PLC_8', 'PLC_100'). If it is intentional, then it looks like the user manual needs a revision (unless I'm reading it wrong). Would be great, if this is intentional, if we could add in an optional flag to change the sort behavior to a true character by character sort.

Kevin.Herron · April 21, 2020, 10:44pm

I can bring it up, but the original ticket from 2016 that implemented this explicitly ruled out the option of making it configurable.

For now, roll your own like before we had this scripting method: How do you sort a dataset?

Kevin.Herron · April 21, 2020, 10:51pm

Ooph, I just realized this was about the expression function, not scripting function. It uses the same utility method with the “friendly” sort order, but rolling your own would mean relying on runScript + my previous post.

WillMT10 · April 21, 2020, 10:59pm

Bummer. Well I guess I would change my feature request to creating a second sortDataset function with the character by character sort

Yeah, we came up with a workaround that is very similar to that post and using the runscript() function. The simplicity of the sortDataset function was nice...makes things a lot cleaner.

So I guess the only bug here is in the user manual (for both the expression and scripting functions):

Sorts a dataset and returns the sorted dataset. This works on numeric, as well as alphanumeric columns. It will go character by character, going from 0-9, A-Z, a-z.

Paul.Scott · April 21, 2020, 11:47pm

Yikes! That's embarrassing. Sorry about the mix-up there. We'll have the documentation team make the corrections.

WillMT10 · April 22, 2020, 12:18am

Just in case someone else only needs to sort by one column in the dataset you can use this smaller piece of code:

def sortDataset(ds, sort_col=0, reverse_order=False):
	hdrs = system.dataset.getColumnHeaders(ds)
	py_ds = system.dataset.toPyDataSet(ds)
	ds_array = [[col for col in row] for row in py_ds]
	ds_array.sort(key=lambda x: x[sort_col], reverse=reverse_order)
	
	return system.dataset.toDataSet(hdrs, ds_array)