[Bug] sortDataset Expression Function Odd Alphanumeric Sort Order

I think this is a bug but I am not certain. In the vision module, the sortDataset() expression handles strings that contain letters and numbers a little odd. It will sort character by character for any letters, which is expected, and I would expect this same behavior for every character in the string, regardless of whether or not it is a letter. However, it appears that, any sequence of numbers within the string are treated as a single number for the character position at the start of the numeric sequence.

Example:
Unsorted dataset (one string type column):

  1. A33Z
  2. A8F3
  3. A1ZZ

Result of sortDataset Expression (ascending argument = True):

  1. A1ZZ
  2. A8F3
  3. A33Z (because 33 > 8?)

I would expect (and the python list.sort() method matches this):

  1. A1ZZ
  2. A33Z
  3. A8F3

I see the same results on 8.0.10 (Linux) and 7.9.12 (Windows).

Looks like it intentionally uses a more “friendly” alphanumeric sorting method when sorting on strings. Starting in… Ignition 7.8.3.

1 Like

Ha, perfect description!

I could see where the current behavior would be useful ('PLC_1', 'PLC_8', 'PLC_100'). If it is intentional, then it looks like the user manual needs a revision (unless I'm reading it wrong). Would be great, if this is intentional, if we could add in an optional flag to change the sort behavior to a true character by character sort.

I can bring it up, but the original ticket from 2016 that implemented this explicitly ruled out the option of making it configurable.

For now, roll your own like before we had this scripting method: How do you sort a dataset?

1 Like

Ooph, I just realized this was about the expression function, not scripting function. It uses the same utility method with the “friendly” sort order, but rolling your own would mean relying on runScript + my previous post.

1 Like

Bummer. Well I guess I would change my feature request to creating a second sortDataset function with the character by character sort :sweat_smile:

Yeah, we came up with a workaround that is very similar to that post and using the runscript() function. The simplicity of the sortDataset function was nice...makes things a lot cleaner.

So I guess the only bug here is in the user manual (for both the expression and scripting functions):

Sorts a dataset and returns the sorted dataset. This works on numeric, as well as alphanumeric columns. It will go character by character, going from 0-9, A-Z, a-z.

Yikes! That's embarrassing. Sorry about the mix-up there. We'll have the documentation team make the corrections.

1 Like

Just in case someone else only needs to sort by one column in the dataset you can use this smaller piece of code:

def sortDataset(ds, sort_col=0, reverse_order=False):
	hdrs = system.dataset.getColumnHeaders(ds)
	py_ds = system.dataset.toPyDataSet(ds)
	ds_array = [[col for col in row] for row in py_ds]
	ds_array.sort(key=lambda x: x[sort_col], reverse=reverse_order)
	
	return system.dataset.toDataSet(hdrs, ds_array)
1 Like