Reordering a query resultset by list of ids

nminchin · April 26, 2023, 1:18pm

I think a lambda should be able to handle it. I think this part below in double asterisks just needs to be edited to match row[i] if index 1 of the order is not None, otherwise, match [row[i][0], None]... but I haven't ever spent the time to understand the lamdba function.. edit: although how they're used in that link makes sense

data.sort(key=lambda(row): tuple(**order.index(row[i])** for order, i in zip(order_lists, indices)))

pascal.fragnoud · April 26, 2023, 1:39pm

Using a function would allow you to catch a ValueError from .index() if the value is not found in an order list, and set up a default value there.
Or, make a wrapper for .index() and try/catch there.

def find_index(l, v):
	try:
		return l.index(v)
	except ValueError:
		return None

def sortds(ds, fields, order_lists):
	headers = list(ds.getColumnNames())
	data = list(system.dataset.toPyDataSet(ds))
	indices = [headers.index(f) for f in fields]
	data.sort(key=lambda(row): tuple(find_index(order, row[i]) for order, i in zip(order_lists, indices)))
	return system.dataset.toDataSet(headers, data)

This gets rid of the ValueError you got earlier.
I also believe the ordering should be fine.

nminchin · April 26, 2023, 1:42pm

Yep, that did the trick! I have to learn more about lambda functions... I thought I sort of understood them, but apparently not...
Thanks for your help!

Edit: printing l and v in that function actually helps my understanding of lambda

nminchin · April 26, 2023, 1:50pm

Oops, found it doesn't work for this situation:

ds = system.dataset.builder(part_id=int, part_length=int).build()
ds = system.dataset.addRows(ds, [
	[1000, 2500],
	[1001, 2200],
	[1002, 254],
	[1003, 1250],
	[1, 3700],
	[0, 2200],
	[0, 2200],
	[1, 2250]])
pds = system.dataset.toPyDataSet(ds)
pds = sortds(ds, ['part_id', 'part_length'], [
	[1, 1, 0, 1000, 0, 1001, 1003, 1002], [3700, 2250, 2200, None, 2200, None, None, None]
	])

I added 0 after 1000 in the order list

it orders like this:

Row	part_id	part_length
0	1	3700
1	1	2250
2	0	2200
3	0	2200
4	1000	2500
5	1001	2200
6	1003	1250
7	1002	254

instead of:

Row	part_id	part_length
0	1	3700
1	1	2250
2	0	2200
3	1000	2500
4	0	2200
5	1001	2200
6	1003	1250
7	1002	254

pturmel · April 26, 2023, 1:52pm

Y'all are killing me. Make the DB do all the work:

SELECT id, others
FROM (
	SELECT table.id, pairs.idx, table.others
	FROM table INNER JOIN (
		SELECT ? as id, 0 as idx
		UNION ALL
		SELECT ?, 1
		UNION ALL
		SELECT ?, 2) pairs
	ON table.id = pairs.id
	UNION
	SELECT -3600, 100000000, others
	UNION
	SELECT -25300, 100000001, others
) subq
ORDER BY idx, ....

Instead of dynamically generating a simple list of IDs for an IN clause, construct a constant table of id and idx pairs. The idx part is simply sequential.

{ Edited to make it use runPrepQuery substitutions. }

pascal.fragnoud · April 26, 2023, 2:01pm

Wait, what ?
Why should it sort it like the second ?

What do you expect to happen when a value appears more than once in an order list ?

nminchin · April 26, 2023, 2:06pm

The list of parts is the order a machine will use to create certain part ids. The operator can select part ids in any order and as many of the same part as they like. So the part_ids list could end up with duplicate part_ids throughout the list, non-sequential to each other, like in the "instead of" table above.

@pturmel Handing it off to the DB was definitely my hope! Thanks for the skeleton, now I just need to massage my much-larger-than-example query to suit

However, I don't think quarter to midnight is the time! This will be tomorrow-Nick's problem

bkarabinchak.psi · April 26, 2023, 2:12pm

Hard agree. The database is made for sorting data. Let it do it.

pascal.fragnoud · April 26, 2023, 2:16pm

That changes everything. I didn't realize there were as many elements in the order lists as in the data, and that they were supposed to match.

nminchin · April 28, 2023, 12:04pm

Well I got that query working, but it failed for my 9600 parts. I copied the entire SQL query into npp and it's 393KB. I pasted into pgAdmin and I get this . Gonna try changing this, but anything else I should be doing?

ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
SQL state: 54001

It's fairly long...

pturmel · April 28, 2023, 12:28pm

Oy! How many result rows are you expecting?

Consider a temporary table to hold the ordering pairs. (Or is the ordering already derived from something in the database?)

nminchin · April 28, 2023, 12:29pm

wait what? I didn't know you could do this?

upwards of 9500
I may have to return it in stages, where the user will have to scroll up blocks

the order is derived by the operator, so not from a table anywhere. How would I create the temp table? and would it be better than the screenshot above from chatgpt?

pturmel · April 28, 2023, 12:36pm

Huh. A constant table made from a VALUES clause. That's new to me. I wonder how widespread support for that is. Try?

nminchin · April 28, 2023, 12:46pm

it seems to work in pgadmin i'm just having an issue where the perspective client isn't using the saved new script atm....

Also, do I get an award for teaching you something??? (even if it was chatgpt that taught me )