I think this is a bug but I am not certain. In the vision module, the sortDataset() expression handles strings that contain letters and numbers a little odd. It will sort character by character for any letters, which is expected, and I would expect this same behavior for every character in the string, regardless of whether or not it is a letter. However, it appears that, any sequence of numbers within the string are treated as a single number for the character position at the start of the numeric sequence.
Example:
Unsorted dataset (one string type column):
A33Z
A8F3
A1ZZ
Result of sortDataset Expression (ascending argument = True):
A1ZZ
A8F3
A33Z (because 33 > 8?)
I would expect (and the python list.sort() method matches this):
A1ZZ
A33Z
A8F3
I see the same results on 8.0.10 (Linux) and 7.9.12 (Windows).
I could see where the current behavior would be useful (‘PLC_1’, ‘PLC_8’, ‘PLC_100’). If it is intentional, then it looks like the user manual needs a revision (unless I’m reading it wrong). Would be great, if this is intentional, if we could add in an optional flag to change the sort behavior to a true character by character sort.
Ooph, I just realized this was about the expression function, not scripting function. It uses the same utility method with the “friendly” sort order, but rolling your own would mean relying on runScript + my previous post.
Bummer. Well I guess I would change my feature request to creating a second sortDataset function with the character by character sort
Yeah, we came up with a workaround that is very similar to that post and using the runscript() function. The simplicity of the sortDataset function was nice…makes things a lot cleaner.
So I guess the only bug here is in the user manual (for both the expression and scripting functions):
Sorts a dataset and returns the sorted dataset. This works on numeric, as well as alphanumeric columns. It will go character by character, going from 0-9, A-Z, a-z.