Sort Dataset on two columns

ematschke · October 24, 2019, 6:35pm

Is there a way to sort a dataset on two different columns. The data in the dataset is the result of a SQL query. Lets say i have a dateset with 10 columns. I have a column that has a flag that is set for example either UP or Down. Another column has a Sequence number that is unique for the two different types. So in the data I have 10 UPs and 10 Downs for a total of 20 rows. I want to sort first on the Type and get all my UPs, then within each type I want to sort by the sequence number so i end up with the first half of the table listing the UPs in order 1, 2,3, and the second half of the table listing the Downs, 1, 2, 3.
I am suspecting I might have to run a query to get all the UPs and Order by Sequence, then run a query to get all the Downs and order by Sequence, then combine the tables without mixing the data and the sorts. Is either of these two options possible?

bpreston · October 24, 2019, 6:53pm

I could see a couple ways to do it. If your doing a query to create the dataset I’d just do and “Order By” in the query. In there you can do both at once and gets rid of the extra step.

If for some reason you can’t do it in the query you can use either the expression sortDataset twice or you can use the script function system.dataset.sort(). Again you would have to run it twice. I’m pretty sure you will need to order it by your sequence number first then by your Up’s and down’s to get it to show how you want it to.

If for some reason you decide you want to do what you mentioned last, you can use system.dataset.appendDataset() to combine the datasets. It should leave them how you put them together, it would just combine the two by putting the second dataset at the bottom.

pturmel · October 24, 2019, 7:08pm

The view() expression function from my Simulation Aids module can do this.

view("SELECT * ORDER BY someColumn, anotherColumn",
    {Root Container.sourceData})

Matrix_Engineering · October 24, 2019, 7:30pm

If you’re displaying the results in a PowerTable; sort the type column and then when holding ctrl click to sort on the sequence column header

ematschke · October 26, 2019, 3:24pm

Decided to go with running two queries into separate datasets, then appending them together. This orders them within the groups, then orders the groups. One time table setup that then will be processed by the operators for a few days, working down the table from top to bottom. We also add a column for “Complete” not contained in the original SQL data so we can track the progress. I can post this simple code if anyone wants it.

cmaynes · March 4, 2020, 8:08pm

I just wanted to respond here with a solution to the problem using Python scripting, with the Python sorted() function. The below function, pydsort, can take a dataset and sort it across multiple columns and provide the intuitive sort results.

import time
import operator
import functools

from com.inductiveautomation.ignition.common import BasicDataset

# helper function just to print out sorted datasets
def print_dataset(d):
	py=system.dataset.toPyDataSet(d)
	print('\t'.join(py.getColumnNames()))
	print('-'*30)
	for i in range(py.rowCount):
		print('\t'.join(py[i]))

# ignition sorted dataset func, with loop for multiple columns
def ignsort(dataset, order_by):
	for oby in order_by:
		dataset = system.dataset.sort(dataset, *oby)
	return dataset

# python sort for multiple columns
def pydsort(dataset, order_by):
	def row_sort_cmp(left, right, order_by):
		for icol, is_asc in order_by:
			order = operator.gt(left[icol], right[icol]) - \
					operator.lt(left[icol], right[icol])
			if order != 0:
				pol = 1 if is_asc else -1
				return order * pol
		return 0
	cols = list(dataset.getColumnNames())
	icols = tuple((cols.index(c), o) for c, o in order_by)
	rows = tuple(tuple(dataset.getValueAt(i, c) for c in cols) for i in range(dataset.rowCount))
	cmp_func = functools.partial(row_sort_cmp, order_by=icols)
	return system.dataset.toDataSet(cols, sorted(rows, cmp=cmp_func))


# create sample dataset
cols = ['col_a', 'col_b', 'col_c']
rows = [
	["1","-1","abc"]
	,["1","2","aab"]
	,["1","2","aaa"]
	,["2","3","b"]
	,["2","4","b"]
	,["2","5","aba"]
	,["3","6","a"]
	,["3","6","b"]
	,["3","6","c"]
	,["6","3","a"]
	,["5","3","a"]
	,["5","3","b"]
	,["4","1","a"]
]
dataset = system.dataset.toDataSet(cols, rows)

# column sort order: (Column Name/Number, Is Ascending boolean)
order_by = [
	('col_a', True),
	('col_b', False),
	('col_c', True)
]


ignds = ignsort(dataset, order_by)

print_dataset(ignds)

pyds = pydsort(dataset, order_by)

print_dataset(pyds)

"""
Results:

col_a	col_b	col_c
------------------------------
3	6	a
5	3	a
6	3	a
4	1	a
1	2	aaa
1	2	aab
2	5	aba
1	-1	abc
3	6	b
2	4	b
2	3	b
5	3	b
3	6	c
col_a	col_b	col_c
------------------------------
1	2	aaa
1	2	aab
1	-1	abc
2	5	aba
2	4	b
2	3	b
3	6	a
3	6	b
3	6	c
4	1	a
5	3	a
5	3	b
6	3	a

You can see from the results, that the Python version sorted across multiple columns and provided the same results you would expect from a SQL order by clause. The Ignition sort results are not consistent with what you would get from SQL.
"""

The solution above works, but you may have to handle a large amount of data and that could be a problem. Taking the same dataset and progressively duplicating and sorting, we can see the Python solution always slower than the Ignition version.

row_mults = [1, 5, 10, 100, 1000, 10000, 25000, 50000, 100000]
times = []
print('\t'.join(['Row Count', 'Ignition Sort', 'Python Sort', 'Python/Ignition']))
for row_mult in row_mults:
	rcount = len(rows) * row_mult
	dataset = system.dataset.toDataSet(cols, rows * row_mult)
	s=time.time()
	ignds = ignsort(dataset, order_by)
	ign_secs = time.time() - s
	s=time.time()
	pyds = pydsort(dataset, order_by)
	py_secs = time.time() - s
	py_ign_mult = 1
	if ign_secs > 0:
		py_ign_mult = py_secs / ign_secs
	print('\t'.join(map(str, [rcount, ign_secs, py_secs, py_ign_mult])))

"""
|Row Count|Ignition Sort|Python Sort|Python/Ignition|
|13|0.0|0.000999927520752|1|
|65|0.0|0.00500011444092|1|
|130|0.000999927520752|0.00600004196167|6.00047687172|
|1300|0.00399994850159|0.0250000953674|6.25010430947|
|13000|0.0339999198914|0.147000074387|4.32354178646|
|130000|0.394000053406|0.957000017166|2.42893372448|
|325000|0.994999885559|2.77200007439|2.78593004343|
|650000|1.56300020218|5.45999979973|3.49328156971|
|1300000|8.10800004005|11.0249998569|1.35976810588|
"""

If you can, sort the data in SQL, especially when your record sets are >=100,000 records. That said, this option is available for when you need to modifying the data or combine data in a client which is not available in the database.

pturmel · March 4, 2020, 8:19pm

I’m confused: In your Ignition sort routine you repeatedly call system.dataset.sort(), which only sorts on one column, replacing the dataset each time through the loop. Why would you think each call would retain any prior sort condition? But you note in a comment that it doesn’t appear to match what SQL would yield, so what is the point of comparing a non-working Ignition sort to your python multi-column sort?

cmaynes · March 6, 2020, 6:10pm

From what @bpreston said, it sounded like he was suggesting to call system.dataset.sort() twice:

In any case, the python code for sorting in a script by multiple columns is still valid.

pturmel · March 6, 2020, 6:51pm

A, missed @bpreston’s suggestion. It’s wrong. Calling system.dataset.sort() twice doesn’t provide any form of chained sorting. The sort from the first call is discarded pseudo-randomly. If you want to sort quickly on multiple columns, make a python list of all the rows, then supply a key function or comparator function to python’s built-in list sort method. (Key function is faster than comparator function, but can’t do mixed order reversal.)

bpreston · March 6, 2020, 7:06pm

@pturmel When I suggested it, I was going off a simple test like what I show below. For the first results, the second sort was commented out. The second shows them both in. I didn’t notice anything that was lost from the first sort with the second. The first sort placed the cities in order and the second then kept the city order but grouped them by timezone in order.
Is there something I’m missing?

pturmel · March 6, 2020, 7:25pm

I don’t think that is a guaranteed behavior. It isn’t documented to do so, and your typical high-performance algorithms won’t do so.

hwbrill · June 8, 2020, 4:33pm

@pturmel, this is something I would be interested in learning more about: based on my previous question… Here’s the scheme: tags in c_name Folder/desk_name Folder/fp_name Tags

The browseTags() actually returns 3215 rows in a 3 col memory tag dataset, which is not exactly what I want to do but, I am getting somewhere in my learning curve. I need help! I need to 1) create a memory dataset tag named ‘FP List’ in each c_name folder/desk_name subfolder. 2) create a memory dataset tag named ‘Desk List’ in each c_name folder. 3) create a memory dataset tag named ‘C List’ under the root, ‘Schafer/’ in this case…
The following is what I’ve been able to accomplish so far…

# Dataset memory tag configuration properties
dataType='DataSet'
tagType='AtomicTag'
valueSource='memory'
# Dataset memory tag configuration list
config={'valueSource':valueSource,'dataType':dataType,'name':'FP List','tagType':tagType}

data=[]

def browseTags(path,filter):

    results=system.tag.browse(path,filter)
    
    for result in results.getResults():
        if result['hasChildren']==True and str(result['tagType'])!='Folder':
            fullPath=str(result['fullPath'])
            splitPath=fullPath.split('/')
            c_name=splitPath[1]
            desk_name=splitPath[2]
            fp_name=splitPath[3]
            data.append([c_name,desk_name,fp_name])
#           print c_name,desk_name,fp_name
        elif str(result['tagType'])=='Folder':
            browseTags(str(result['fullPath']),filter)
#        print c_name,desk_name,fp_name
#        data.append([c_name,desk_name,fp_name])
    system.tag.configure('Schafer',[config],'o')
    
    header=['c_name','desk_name','fp_name']
    
    table=system.dataset.toDataSet(header,data)
    system.tag.write('Schafer/FP List',table)

browseTags('Schafer',{})

Jfrantz · March 5, 2021, 7:01pm

I’m making a Data Instance that has a data set within it.
When I try to use an Expression tag within my Data instance This is not working.

Any suggestions? Presort Data is created From Pyton Script

pturmel · March 5, 2021, 9:03pm

Error messages? Wrong results? How is it not working? (Did you install my module?)

Jfrantz · March 8, 2021, 12:13pm

I have the Module installed and have used it many times.
It seems like the Expression Tag i created within a Tag Data type does not hold the data it finds.
Even a basic view(‘SELECT ITEM_NO’,{[.]PresortHealthData}) returns 0 Row when there should be 46.
Note: Presort Data was created From Script and placed into that Memory Tag

pturmel · March 8, 2021, 4:37pm

Ah, you are experiencing the one unavoidable negative behavior of Transient Datasets–they don’t transfer from gateway to client or designer. What version of Ignition is this? If v7.9, you can wrap the view() function in a toDataSet() function to make it non-transient. This doesn’t work in v8+.

(If you operate on the data entirely in Gateway scope, it’ll be visible there.)

Jfrantz · March 8, 2021, 4:40pm

Thank you...I'll try the toDataSet(). I used View on my power table and made a custom property to hold Tag Data.