Sql pareto for column values per row help request

For one thing, you could stop sorting THEN reversing when you could be sorting directly in reverse order, especially if you’re gonna do it in a loop. sorted(iterable, reverse=True)

But more importantly, you’re nesting loops, and some of them for absolutely nothing. Example:

for row in dataIn:
	data2=[row[colName] for colName in suffixHeaders]	
	testThis =[[x,y] for y, x in zip(data2,suffixHeaders)]

You’re computing 2 lists for every row, but at the end all you’re left with are the lists for the last row only.

3 Likes

@lrose
thanks, yes for the net and defects being added to the query and matching the list
output of print testThis seems correct, column names and numbers
I did not put the percentage in the query
I thought the script would be best for performing and appending that. Thought to save for last.
I also want to make a prefex list that is appended or extended for showing the Line.

@pascal.fragnoud
I think I do that wasteful looping because I need to learn more about the looping in python
my dataIn only has one row, but in the event to have more rows, I wondered how it would work
I forgot to implement the reverse=true.

I don’t see why I am getting timeout though.


tried

pareto2= [x + testThis for x in pareto]

but that is not right, gave me the list in testThis in there
[[1,2,[3,4],[6,7]], .....
when what I want is more like:
[[1,2,3,4,5,6],...

1~ reason name
2 ~ qty from that reason
3 ~ net column name
4 ~ qty net
5 ~ defect column name
6~ qty defects

if I try

pareto2= [x + testThis[0][0] + testThis[0][1]+ testThis[1][0]+ testThis[1][1] for x in pareto]

throws an error saying can’t concat list because not a str

So maybe I could cast them as str and it would work? Seems like not the way

this kind of works
pareto[0].insert(23,testThis[0][0])

and this worked, but probably should not work haha

for i in range(len(pareto)):
 for j in range(len(testThis)):
  for k in range(len(testThis)):
  	pareto[i].insert(23,testThis[j][k])

I sort of helped it with changing 23 to len(pareto), but not sure how to get the length of testThis rows or col separate since dataset not pydataset if I understand right


for i in range(len(pareto)):
 for j in range(len(testThis)):
  for k in range(len(testThis)):
     pareto[i].insert(len(pareto),testThis[j][k])

Why not just do this:


filterHeaders = reasonHeaders + suffixHeaders
dataOut = []

for row in dataIN:
    data = [ row[colName] for colName in filterHeaders]
    dataOut.append([x,y for y, x in sorted(zip(data,filterHeaders),reverse=True)])

for d in dataOut:
    print d

won’t that sort on the net like it is a reason code and my table with have net some where random?

I think I need to sort the pareto of the reasons for scrap.

Then after I have that sorted list, then I would insert the net and meta data like which machine.

I think after I have this this pareto with the net data, I can then do a percentage from column 1 (second thing in my rows should be a number, and divide the 4th thing, column 4? net number

to display as the new column 8? and column 7 would be the

errr

omg I have confused myself so bad, that I think I put the net title header into the list when I only need the number there, because the net header title is the actual table header

my table headers should be these guys

Reason  |   Scrap Qty  |  Net  |  Total Defects | Percentage of Defects

so

for i in range(len(pareto)):
 for j in range(len(testThis)):
  	pareto[i].insert(len(pareto),testThis[j][1])

I defined suffixHeaders at the headers from the query for net and defects
I defined reasonHeaders and suffixHeaders.
reasonHeaders=[‘thisReason’,…‘nReason’]
suffixHeaders=[‘thisNet’,‘thisDefects’]

	
for row in dataIn:
	data=[row[colName] for colName in reasonHeaders]	
	pareto = [[x, y] for y, x in sorted(zip(data, reasonHeaders),reverse=True)]

print pareto
 



for row in dataIn:
	data2=[row[colName] for colName in suffixHeaders]	
	testThis =[[x,y] for y, x in zip(data2,suffixHeaders)]
	
print testThis

print len(pareto)

for i in range(len(pareto)):
 for j in range(len(testThis)):
  	pareto[i].insert(len(pareto),testThis[j][1])

	
print '-----'
	
print pareto


It works, but it is not optimal

and I need to insert math too

tried this to add the math

for i in range(len(pareto)):
	pareto[i].insert(len(pareto),(pareto[i][1]/pareto[i][3]))

but first it rounded to a whole number, and I need to check how to handle if they are zeroes
the query addresses nulls with sum(isnull(value,0)) as name

It actually sorts based on data. If you also want to sort just the reasonHeaders then do the sorting prior to concatenating the lists together (remember that lists maintain order) then the outcome will have the order that you want.

filterHeaders = sorted(reasonHeaders, reverse=True) + suffixHeaders

Then you can sort the zip, which will sort based on the first element of the tuple.

for row in dataIN:
    data = [ row[colName] for colName in filterHeaders]
    dataOut.append([x,y for y, x in sorted(zip(data,filterHeaders),reverse=True)])

for d in dataOut:
    print d
1 Like

when I put it in, that throws an error mismatched input for expecting rbrack, but I don’t see a missing brack

the code looks very clean, and I want to try it


how do I fix the rounding and zero cases? I think this part and I am done

for i in range(len(pareto)):
	pareto[i].insert(len(pareto),(pareto[i][1]/pareto[i][3]))

first it rounded to a whole number, and I need to check how to handle if they are zeroes
the query addresses nulls with sum(isnull(value,0)) as name

maybe I can

for i in range(len(pareto)):
	if pareto[i][1]!=0 and pareto[i][3] !=0:  
	   pareto[i].insert(len(pareto),(pareto[i][1]/pareto[i][3]))
	else :    
	   pareto[i].insert(len(pareto),0)

and maybe I can cast it to a float?

wouldn't it be

dataOut.append([[x,y] for y, x in sorted(zip(data,filterHeaders),reverse=True)])
1 Like

for i in range(len(pareto)):
	if pareto[i][1]!=0 and pareto[i][3] !=0:  
	   pareto[i].insert(len(pareto),(str(round( (float(pareto[i][1])/float(pareto[i][3])),2) )+"%"))
	else:    
	   pareto[i].insert(len(pareto),str(round(float(0.00),2))+"%") 
	

this seems very close, but I think the number came out weird

I will try that next dkhayes

dataOUT should be a list of lists, not a list of list of lists. :face_with_spiral_eyes:

I think it should be:

dataOut.append([x,y] for y, x in sorted(zip(data,filterHeaders),reverse=True))

1 Like

I tried that and the one posted by DKHayes

this posted <generator object at 0x2>
then a ton of garbage from my supposedly commented out stuff

the other
posted like the reason and the number from reason, but nothing of the net or defects
then garbage from my supposed to be commented out stuff

however, I got what I needed to work with the other code stuff
Might not be pretty, but it works

for row in dataIn:
	data=[row[colName] for colName in reasonHeaders]	
	pareto = [[x, y] for y, x in sorted(zip(data, reasonHeaders),reverse=True)]

print pareto

for row in dataIn:
	data2=[row[colName] for colName in suffixHeaders]	
	testThis =[[x,y] for y, x in zip(data2,suffixHeaders)]
	
print testThis

print len(pareto)

for i in range(len(pareto)):
 for j in range(len(testThis)):
  	pareto[i].insert(len(pareto),testThis[j][1])

for i in range(len(pareto)):
	if pareto[i][1]!=0 and pareto[i][3] !=0:  
	   pareto[i].insert(len(pareto),(str(round( (float(pareto[i][1])/float(pareto[i][3])),2) )+"%"))
	else:    
	   pareto[i].insert(len(pareto),str(round(float(0.00),2))+"%") 
	
	
print '-----'

print pareto[8][0]
print pareto[8][1]
print pareto[8][3]
print pareto[8][4]
	
print pareto  

If you wanted to get semi-testable data though
this post which doesn't have the additional net and defects I had added, but it was a big help for me

thanks very much, I have learned a ton and bookmarking, sharing


immediately found it is broken for half of machines :frowning:

Yep, you're right, the outside brackets are not needed!

They kind of are, if they’re around a generator, otherwise you get the generator itself and it’s probably not what you want to be appending.

@zacharyw.larson : And here’s my weekly rant about len(range()) !
If there’s a len(range()) in your code, there’s a more pythonic way. Always.

for i in range(len(pareto)):
	for j in range(len(testThis)):
		pareto[i].insert(len(pareto),testThis[j][1])

for i in range(len(pareto)):
	if pareto[i][1]!=0 and pareto[i][3] !=0:  
	   pareto[i].insert(len(pareto),(str(round( (float(pareto[i][1])/float(pareto[i][3])),2) )+"%"))
	else:    
	   pareto[i].insert(len(pareto),str(round(float(0.00),2))+"%")

#### becomes

p_len = len(pareto)
for p in pareto:
	for t in testThis:
		p.insert(p_len, t[1])
	if p[1] != and p[3] != 0:
		p.insert(p_len), "{}%".format(round(float(p[1])/float(p[3]), 2)
	else:
		p.insert(p_len, "0.00%")
	

Find better names for p and t, I have no clue what your lists hold so I just use a terrible, generic name.
Also note that I took out the len(pareto) bit. It’s not gonna change from one iteration to the next, so take it out of the loop and compute it only once instead of doing it every time.
str(round(float(0.00),2))+"%" is always going to be "0.00". Don’t do computations for this, especially repeatedly in a loop !
And since both outer loops are the same, let’s combine them into just one.

Again, I have no clue what’s going on here, I’m simply approaching this from a refactoring standpoint. This is functionally identical, but it’s prettier. And more efficient.

1 Like

I think I had to compute the length each time because in one of the loops I added a thing to the end, and changed the length.

I don’t understand what p[1] is.
I think I have to identify the row and column.

if you use for p in pareto instead of for i in range(len(pareto)), then p[1] is what pareto[i][1] would have been

for i in range(len(pareto)):
	x = pareto[i][1]

for p in pareto:
	x = p[1]

Those do the same things.

how do I know where I am in the table with only one value?
Is this only when there is one row?

for one machine, this script works

for another machine, it says the line with the division doesn’t work. type error flaot() must be a string or a number
so I think it came in as null? I will try to coalesce it in the query. I think this is to be set before the script.

ah weird, I thought I had added isnull(,0) to all the values in my query select.
Some were not there.

You only need one value to index the one you want because you are looping over the rows. So for every row, I want the value in the [1] column of the current row. If you want to know which row you are in you can enumerate it

for i,row in enumerate(pareto):
    ...
1 Like

You know where you are because the variable you get is... exactly where you are.
When using for i in range(len(x)), i is an index, and x[i] will tell you what item of x you're looking at.
With for i in x, i is not an index anymore, it is an item, the one you'd get with x[i] in the previous version.
It works with as many dimensions as you want.

for list_of_lists in list_of_lists_of_lists:
    for list in list_of_lists:
        for item in list:
            item #which would be list_of_lists_of_lists[i][k][k] if you had used indices

Trying to understand

script runs, len(pareto) is 23
Then t[1] gets inserted, so now length is 24?
but the next insert says insert at 23, so the order is wrong no?

You are inserting into the list p not the list of lists pareto. The length of pareto is not changing.

@pascal.fragnoud didn't correct any of the logic in the script, like how you were determining where to insert the value t[1] into p.

Also, if you're wanting to insert an item at the end of a list you should use the append method not insert.

1 Like

As @lrose said, i'm not fixing the logic, because as I said a bit earlier