Sql pareto for column values per row help request

I thought I would get the proof of concept to work in a table on a view.
Then I would use what I had learned to make the actual thing I would display.

My vision is a table of scrap reasons per line sorted by the amount of scrap


(line ) ( scrap reason) (# scrapped) (percent of total scrapped) ( total scrapped)

[ rows of  reasons for scrap that are sorted by the amount scrapped]


I did this for my faults, but I did it with a query.
I had the transaction group setup perfect for it though. One column was the code and the transaction triggered when that code changed.

I am not a fan of working with counters in a plc to setup a table in Ignition.
I prefer to get increments from the plc, make transaction groups capture the increments, then query those to make my counts. After this though, maybe counters will be easier.


wish I got it to work in a query, I want to do some graphs with the data too

I’m not qualified to help with sql, but…

sorted can take a reverse parameter, so you could do this instead:

pareto = [[x, y] for y, x in sorted(zip(data, filteredHeaders), reverse=True)][:5]

Not having several parentheses next to each other makes me happier.

2 Likes

ah

we use reverse for descending

I got this when I tried to print

print dataIN

<PyDataset rows:1 cols:30>

or I get each value on a new line with this

for row in range(dataIN.getRowCount()):
	for col in range(dataIN.getColumnCount()):
		print dataIN.getValueAt(row,col)

trying to get the header next to it, print someheaderthing+dataIN.getValueAt(row,col)
?

oh I see, you zipped the header with the data to get them in the same row,col

The advantage of a pyDataset is you don’t have to go to all of that trouble.

for row in dataIN:
    for value in row:
        print value
2 Likes

I really hope they make basic datasets iterable someday

3 Likes

Here’s a function to help with that.

2 Likes

error: line 14
list object has no attribute getRowCount
lol nooooooo

I defined namedQuery, reasonHeaders, params, and dataIn


6   dataIn=system.dataset.toPyDataSet(system.db.runNamedQuery(namedQuery,params))

7   print datatIn

11  for row in dataIn:
12        filtered headers = [colName for colName in dataIn.getColumnName() if colName in reasonHeaders]

14     for row in range(filteredHeaders.getRowCount()):
15	     for col in range(filteredHeaders.getColumnCount()):
16		   print filteredHeaders.getValueAt(row,col)


Well, your header is not a dataset, you can’t call dataset methods on it

2 Likes

And, you only need to get the filtered headers once.

dataIn=system.dataset.toPyDataSet(system.db.runNamedQuery(namedQuery,params))

print datatIn
filteredHeaders = [colName for colName in dataIn.getColumnNames() if colName in reasonHeaders]

for row in dataIn:
	for col in filteredHeaders:
		print col, ' : ', row[col]
1 Like

so the function you posted to help, that lets you view a dataset from script console

I thought I already could do that with the two ways in LRose’s post?
OHH your post adds the structure, so you can visualize it, got it

oh so I just print the list to check it

print filteredHeaders
1 Like
dataIn=system.dataset.toPyDataSet(value)
	
reasonHeaders = ['thisReason', 'thatReason', 'nReason']


for row in dataIn:
	data=[row[colName] for colName in reasonHeaders]	
	pareto = [[x, y] for y, x in reversed(sorted(zip(data, reasonHeaders)))]

return pareto

omg it is a alive,

on the table, I set field to column_1 and the next one to colomn_2
then I set the headers to reason and quantity

Now I just have to go back and edit the dates into query and some summing

seems to be handling none type well so far, but if I add summing, I probably need isnull


I have it looking great, except when the values are 0 for reason, they are listed in reverse alphabetical order haha

nobody has complained yet

I have to add net and defects to the table, then also line and a percentage of defects per net

However, I am not getting an error, yet it is not working.

time is going too long on the script execute, says interrupt on the button

I defined reasonHeaders and suffixHeaders.
reasonHeaders=[‘thisReason’,…‘nReason’]
suffixHeaders=[‘thisNet’,…‘nCalculation’]

for row in dataIn:
	data=[row[colName] for colName in reasonHeaders]	
	pareto = [[x, y] for y, x in reversed(sorted(zip(data, reasonHeaders)))]

#print pareto



for row in dataIn:
	data2=[row[colName] for colName in suffixHeaders]	
	testThis =[[x,y] for y, x in zip(data2,suffixHeaders)]
	
print testThis


for row in pareto:
	for col in pareto:
		pareto.append(testThis)
	
print '-----'
	
print pareto

I thought if I got them both in lists, I could do something like:
for column append

keeps timing out, I am reviewing append function

I am not sure how the append function handles appending a list of lists to another lists of lists, but I’m not certain that’s really what you’re looking for anyway.

Also, I don’t believe there will be any data in testThis, unless there are actually column names in dataIN that actually match the suffixHeaders.

How are you defining “net”, “defects”, “line”, and “percentage of defects per net”?

1 Like

For one thing, you could stop sorting THEN reversing when you could be sorting directly in reverse order, especially if you’re gonna do it in a loop. sorted(iterable, reverse=True)

But more importantly, you’re nesting loops, and some of them for absolutely nothing. Example:

for row in dataIn:
	data2=[row[colName] for colName in suffixHeaders]	
	testThis =[[x,y] for y, x in zip(data2,suffixHeaders)]

You’re computing 2 lists for every row, but at the end all you’re left with are the lists for the last row only.

3 Likes

@lrose
thanks, yes for the net and defects being added to the query and matching the list
output of print testThis seems correct, column names and numbers
I did not put the percentage in the query
I thought the script would be best for performing and appending that. Thought to save for last.
I also want to make a prefex list that is appended or extended for showing the Line.

@pascal.fragnoud
I think I do that wasteful looping because I need to learn more about the looping in python
my dataIn only has one row, but in the event to have more rows, I wondered how it would work
I forgot to implement the reverse=true.

I don’t see why I am getting timeout though.


tried

pareto2= [x + testThis for x in pareto]

but that is not right, gave me the list in testThis in there
[[1,2,[3,4],[6,7]], .....
when what I want is more like:
[[1,2,3,4,5,6],...

1~ reason name
2 ~ qty from that reason
3 ~ net column name
4 ~ qty net
5 ~ defect column name
6~ qty defects

if I try

pareto2= [x + testThis[0][0] + testThis[0][1]+ testThis[1][0]+ testThis[1][1] for x in pareto]

throws an error saying can’t concat list because not a str

So maybe I could cast them as str and it would work? Seems like not the way

this kind of works
pareto[0].insert(23,testThis[0][0])

and this worked, but probably should not work haha

for i in range(len(pareto)):
 for j in range(len(testThis)):
  for k in range(len(testThis)):
  	pareto[i].insert(23,testThis[j][k])

I sort of helped it with changing 23 to len(pareto), but not sure how to get the length of testThis rows or col separate since dataset not pydataset if I understand right


for i in range(len(pareto)):
 for j in range(len(testThis)):
  for k in range(len(testThis)):
     pareto[i].insert(len(pareto),testThis[j][k])

Why not just do this:


filterHeaders = reasonHeaders + suffixHeaders
dataOut = []

for row in dataIN:
    data = [ row[colName] for colName in filterHeaders]
    dataOut.append([x,y for y, x in sorted(zip(data,filterHeaders),reverse=True)])

for d in dataOut:
    print d

won’t that sort on the net like it is a reason code and my table with have net some where random?

I think I need to sort the pareto of the reasons for scrap.

Then after I have that sorted list, then I would insert the net and meta data like which machine.

I think after I have this this pareto with the net data, I can then do a percentage from column 1 (second thing in my rows should be a number, and divide the 4th thing, column 4? net number

to display as the new column 8? and column 7 would be the

errr

omg I have confused myself so bad, that I think I put the net title header into the list when I only need the number there, because the net header title is the actual table header

my table headers should be these guys

Reason  |   Scrap Qty  |  Net  |  Total Defects | Percentage of Defects

so

for i in range(len(pareto)):
 for j in range(len(testThis)):
  	pareto[i].insert(len(pareto),testThis[j][1])

I defined suffixHeaders at the headers from the query for net and defects
I defined reasonHeaders and suffixHeaders.
reasonHeaders=[‘thisReason’,…‘nReason’]
suffixHeaders=[‘thisNet’,‘thisDefects’]

	
for row in dataIn:
	data=[row[colName] for colName in reasonHeaders]	
	pareto = [[x, y] for y, x in sorted(zip(data, reasonHeaders),reverse=True)]

print pareto
 



for row in dataIn:
	data2=[row[colName] for colName in suffixHeaders]	
	testThis =[[x,y] for y, x in zip(data2,suffixHeaders)]
	
print testThis

print len(pareto)

for i in range(len(pareto)):
 for j in range(len(testThis)):
  	pareto[i].insert(len(pareto),testThis[j][1])

	
print '-----'
	
print pareto


It works, but it is not optimal

and I need to insert math too

tried this to add the math

for i in range(len(pareto)):
	pareto[i].insert(len(pareto),(pareto[i][1]/pareto[i][3]))

but first it rounded to a whole number, and I need to check how to handle if they are zeroes
the query addresses nulls with sum(isnull(value,0)) as name

It actually sorts based on data. If you also want to sort just the reasonHeaders then do the sorting prior to concatenating the lists together (remember that lists maintain order) then the outcome will have the order that you want.

filterHeaders = sorted(reasonHeaders, reverse=True) + suffixHeaders

Then you can sort the zip, which will sort based on the first element of the tuple.

for row in dataIN:
    data = [ row[colName] for colName in filterHeaders]
    dataOut.append([x,y for y, x in sorted(zip(data,filterHeaders),reverse=True)])

for d in dataOut:
    print d
1 Like

when I put it in, that throws an error mismatched input for expecting rbrack, but I don’t see a missing brack

the code looks very clean, and I want to try it


how do I fix the rounding and zero cases? I think this part and I am done

for i in range(len(pareto)):
	pareto[i].insert(len(pareto),(pareto[i][1]/pareto[i][3]))

first it rounded to a whole number, and I need to check how to handle if they are zeroes
the query addresses nulls with sum(isnull(value,0)) as name

maybe I can

for i in range(len(pareto)):
	if pareto[i][1]!=0 and pareto[i][3] !=0:  
	   pareto[i].insert(len(pareto),(pareto[i][1]/pareto[i][3]))
	else :    
	   pareto[i].insert(len(pareto),0)

and maybe I can cast it to a float?

wouldn’t it be

dataOut.append([[x,y] for y, x in sorted(zip(data,filterHeaders),reverse=True)])
1 Like