Help inserting rows of totals to a dataset from a query

I want to do both ways.

Will the script overhead be very low? I am expecting the overhead to be low.
It will run once when the page opens.
I think you are right that reporting probably handles this nicely in a grouping.
I just don’t like always how the report looks, and it is not consistent with my other pages.
I think I have to make progress tomorrow morning, so I might do the report side first.


I am going to return to this tomorrow.

I think it will help me with my page, but it is costing me a bit of time.
The request was pretty specific so I think it will be worth it.
It will be a bit nice to figure out.
I think/hope @JordanCClark might appreciate this as a brain teaser.
This is sort of how far or close I got thus far:

### Sample Dataset to simulate query result
import random
headers = ['date', 'shift', 'line', 'hours', 'gross', 'net']
data = []
t_stamp = system.date.now()
for i in range(10):
	if i<4:
	 date=t_stamp
	else:
	 date = system.date.addHours(t_stamp, 8)
	if i<4:
	 shift=1
	else:
	 shift=2
	lineCol=('b' +str(random.randrange(5)))
	hoursCol=random.randrange(12)
	gross= random.randrange(100)
	net=random.randrange(100)
	data.append([date,shift, lineCol,hoursCol,gross,net])
dataIn = system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))
#for row in dataIn:
# for value in row:
#  print value

for row in dataIn:
	print [row[colName] for colName in dataIn.getColumnNames()]
print '---'

dataOut=[]
#for row in range(dataIn.getRowCount()):
test=sum( dataIn.getValueAt(row,dataIn.getColumnIndex('net')) for row in range(dataIn.getRowCount()))

print test
def handleDivideZero(a,b):
		return 0 if (a and b)==0 else a/b 
print data
#data.append([t_stamp,0,'      '+' testing this',0,0,test])
print data
#dataOut= system system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))

#index= dataIn[1][1]
test =  (dataIn[row] for row in range(dataIn.getRowCount()) if dataIn[row][1]==2)

for thing in test:
 for value in thing:
  print value


I think if I can slice that last set, then I might be able to use .index to find my inserts for shift 1 and shift 2.
I think I can use .append for shift 3.
Maybe I should just make a dataset of each shift one at a time, then append my totals after each one.
Then append all of datasets at the end.
Maybe I will do that tomorrow.

The script shouldn't be a problem, I just meant maintenance overhead.

1 Like

Will you unpack that for me some?

I don’t know what you mean.

Sometimes data changes, maybe one of the column names changes in the future, or another column of interest is added 12 months down the road, or some sort of edge case pops up. When you come back to add/fix that, now you have a script you have to decipher. Where a report may just need a key, since it handles the (sub)totalizing internally. Less code (scripting in this case) is best code.

2 Likes

You may use SQL query to find the first value in that dataset where shift = 1 or 2.
SELECT TOP 1 * FROM Dataset WHERE Shift = 1;

1 Like

Assuming your query will return something with ORDER BY Timestamp, Shift, Line to give data as in your original post. I would even go so far as to use just the date portion of Timestamp. Then, you are just looking for changes in either the date or the shift.

sampleHeaders = ['Line', 'Shift', 'Hours', 'Gross', 'Net', 'Timestamp']
sampleData = [['K01', 1, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 1, 12, 202, 152, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 1, 12, 203, 153, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K01', 2, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 2, 12, 208, 155, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 2, 12, 209, 157, system.date.parse('2022-09-25', 'yyyy-MM-DD')]
             ]

datasetIn = system.dataset.toPyDataSet(system.dataset.toDataSet(sampleHeaders, sampleData))

#util.printDataSet(datasetIn)
#print ''

headers = list(datasetIn.getColumnNames())
dataOut = []

grossSum = netSum = 0
for row, nextRow in zip(datasetIn, datasetIn[1:]):
	# Add values for the row
	grossSum += row['Gross']
	netSum += row['Net']
	# Append the row into the output data
	dataOut.append(list(row))
	# Compare the timestamp and shift values to the next row.
	# If they're different, append the total row to the output data.
	if row['Timestamp'] != nextRow['Timestamp'] or row['Shift'] != nextRow['Shift']:
		dataOut.append(['Total', None, None, grossSum, netSum, None]) 
		grossSum=0
		netSum = 0
# Process the final row
grossSum += nextRow['Gross']
netSum += nextRow['Net']
dataOut.append(list(nextRow))
dataOut.append(['Total', None, None, grossSum, netSum, None]) 

# Create the dataset
datasetOut = system.dataset.toDataSet(headers, dataOut)

#util.printDataSet(datasetOut)

Output:

row | Line | Shift | Hours | Gross | Net | Timestamp                   
-----------------------------------------------------------------------
0   | K01  | 1     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
1   | K02  | 1     | 12    | 202   | 152 | Tue Jan 25 00:00:00 EST 2022
2   | K03  | 1     | 12    | 203   | 153 | Tue Jan 25 00:00:00 EST 2022
3   | K01  | 2     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
4   | K02  | 2     | 12    | 208   | 155 | Tue Jan 25 00:00:00 EST 2022
5   | K03  | 2     | 12    | 209   | 157 | Tue Jan 25 00:00:00 EST 2022

row | Line  | Shift | Hours | Gross | Net | Timestamp                   
------------------------------------------------------------------------
0   | K01   | 1     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
1   | K02   | 1     | 12    | 202   | 152 | Tue Jan 25 00:00:00 EST 2022
2   | K03   | 1     | 12    | 203   | 153 | Tue Jan 25 00:00:00 EST 2022
3   | Total | None  | None  | 605   | 455 | None                        
4   | K01   | 2     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
5   | K02   | 2     | 12    | 208   | 155 | Tue Jan 25 00:00:00 EST 2022
6   | K03   | 2     | 12    | 209   | 157 | Tue Jan 25 00:00:00 EST 2022
7   | Total | None  | None  | 617   | 462 | None   
3 Likes

Wow @JordanCClark

for row, nextRow in zip(datasetIn, datasetIn[1:]):

Thanks so much. This makes sense. I didn’t realize I could do that.


I will test if these are doing the same:

grosSum= sum(row['Gross']) and grossSum += row['Gross']
update: One is used in the for loop
The other is used better as a list comprehension and not as I wrote it here, and I was confused.


I will test if these are doing the same:
dataOut.append(list(row)) and dataOut.append([row])
This one confuses me a little more.

dataOut.append(list(row))
dataOut2.append([row])

I put them in sequence. Then I tried to print them at the end.
Seemed to add a another nest.
Appending list(row) adds a list of values.
Appending ([row]) adds each value as it’s own list.
I don’t fully understand that.


You have a section for processing the final row.
I am trying to print the list of row and next row and next row to learn, but getting errors.
test = [row, nextRow for (x,y) in zip(datasetIn, datasetIn[1:])]
I think it is because they are not the same length.

However, I can print with:

for row, nextRow in zip(datasetIn, datasetIn[1:]):
	for value in row:
	 print value
	for value in nextRow:
	 print value

I knew that the nextRow would be a shorter by one row.

When you do a for loop with a zip of two lists of unequal lengths, the loop just ends when one of the lists is shorter?

W3schools on zip(): If the passed iterators have different lengths, the iterator with the least items decides the length of the new iterator.

I can’t make a list comprehension of zipping two unequal length lists though?
This worked:

test = ((a,b) for (a,b) in zip(datasetIn, datasetIn[1:]))

Trying to understand it better.


So now, I don’t understand how processing the final worked.

grossSum += nextRow['Gross']
nextRow should contain 1st and 2nd shift data. When I print, I see that it is adding up only the 2nd shift data, but I don't see how.

I tried:

print grossSum
for value in nextRow:
 print value
Output was this: ``` 617 K03 2 12 209 157 Tue Jan 25 00:00:00 EST 2022 >>> ``` nextRow contained only the last row, but why didn't it print ?

Oh, grossSum already added up the previous values for shift2, and it just needed one more.
The last value in nextRow was the final row from dataIn.
I see now.
Thanks so much.

1 Like

Append and lists:

Appending obj to a list will append it directly. Will means, if it's a list, you'll have a list inside your list.
Now, the difference between append(list(row)) and append([row]) is simple. In your case, row is already a list. Maybe not a python built-in list, some kind of wrapper, but a list nonetheless.
So, when you append list(row), you're casting the list into a list, then appending it.
When appending [row], you're making a list that contains row, and appending it.
just try this in your console:

print list(row)
print [row]

and you'll get the same difference.

If you want to append the values of a list to another list, you can use .extend([1, 2, 3]). This will append 1, then append 2, then append 3.


That's because... you're not using the right names. You're unpacking the zip values into (x, y), then trying to print row and nextRow.

If you want to zip two sequences of unequal length and not have it stop when the shortest one is finished, you can use izip_longest from itertools:

from itertools import izip_longest as lzip

foo = ['a', 'b', 'c']
bar = [1, 2, 3, 4]

list(lzip(foo, bar))
# > [('a', 1), ('b', 2), ('c', 3), (None, 4)]
list(lzip(foo, bar, fillvalue="boo"))
# > [('a', 1), ('b', 2), ('c', 3), ('boo', 4)]

If padding with None, you can also use map(None, iter1, iter2...):


foo = ['a', 'b', 'c']
bar = [1, 2, 3, 4]

map(None, foo, bar)
# > [('a', 1), ('b', 2), ('c', 3), (None, 4)]
1 Like

I didn’t get that last part.

foo = ['a', 'b', 'c']
bar = [1, 2, 3, 4]
mapTest=("b00",foo,bar)
print mapTest

got me:

('b00', ['a', 'b', 'c'], [1, 2, 3, 4])


ah I forgot the map.

[('a', 1), ('b', 2), ('c', 3), (None, 4)]


What was that way to print pythondatasets similar to:

print (list(map(list,datasetIn)))

Except with new lines for each list in the lists?
@pascal.fragnoud ?

Frankly I’d just do

for row in pydataset:
    row
# or print repr(row)

if you really want a one liner (or have a single print, which is faster)

print '\n'.join(repr(row) for row in data)

or, if you really want the map:

print '\n'.join(map(repr, data))
2 Likes

I just wanna say... I've become a huge fan of your simplifications

1 Like

Keep up the good work!

print '\n'.join(map(repr, datasetIn))
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>

It is close.

huh, what’s your datasetIn ? Is that a pyDataSet ?

image

1 Like

This one is a pydataset.

Yes.

I don’t like the for loops because I end up with a line for each of the values, rather than a line for each row.

That's... unexpected. Can you show me the whole thing ? Where are you doing this ?

That's not supposed to happen.
image

1 Like
sampleHeaders = ['Line', 'Shift', 'Hours', 'Gross', 'Net', 'Timestamp']
sampleData = [['K01', 1, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 1, 12, 202, 152, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 1, 12, 203, 153, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K01', 2, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 2, 12, 208, 155, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 2, 12, 209, 157, system.date.parse('2022-09-25', 'yyyy-MM-DD')]
             ]

datasetIn = system.dataset.toPyDataSet(system.dataset.toDataSet(sampleHeaders, sampleData))

 


for row in datasetIn:
 row
>>> 
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
>>>

or:

for row in datasetIn:
 for value in row:
 	value
>>> 
u'K01'
1
12
200
150
Tue Jan 25 00:00:00 EST 2022
u'K02'
1
12
202
152
Tue Jan 25 00:00:00 EST 2022
u'K03'
1
12
203
153
Tue Jan 25 00:00:00 EST 2022
u'K01'
2
12
200
150
Tue Jan 25 00:00:00 EST 2022
u'K02'
2
12
208
155
Tue Jan 25 00:00:00 EST 2022
u'K03'
2
12
209
157
Tue Jan 25 00:00:00 EST 2022

Well obviously, if you're nesting loops, you'll get cell values instead of rows.

click for scripts
sampleHeaders = ['Line', 'Shift', 'Hours', 'Gross', 'Net', 'Timestamp']
sampleData = [['K01', 1, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 1, 12, 202, 152, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 1, 12, 203, 153, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K01', 2, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 2, 12, 208, 155, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 2, 12, 209, 157, system.date.parse('2022-09-25', 'yyyy-MM-DD')]
             ]

data = system.dataset.toPyDataSet(system.dataset.toDataSet(sampleHeaders, sampleData))

for row in data:
	row

"#"*10

for row in data:
	print repr(row)

"#"*10

print '\n'.join(map(repr, data))

"#"*10

print '\n'.join(repr(row) for row in data)
click for results

[u'K01', 1, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 1, 12, 202, 152, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 1, 12, 203, 153, Tue Jan 25 00:00:00 CET 2022]
[u'K01', 2, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 2, 12, 208, 155, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 2, 12, 209, 157, Tue Jan 25 00:00:00 CET 2022]
'##########'
[u'K01', 1, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 1, 12, 202, 152, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 1, 12, 203, 153, Tue Jan 25 00:00:00 CET 2022]
[u'K01', 2, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 2, 12, 208, 155, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 2, 12, 209, 157, Tue Jan 25 00:00:00 CET 2022]
'##########'
[u'K01', 1, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 1, 12, 202, 152, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 1, 12, 203, 153, Tue Jan 25 00:00:00 CET 2022]
[u'K01', 2, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 2, 12, 208, 155, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 2, 12, 209, 157, Tue Jan 25 00:00:00 CET 2022]
'##########'
[u'K01', 1, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 1, 12, 202, 152, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 1, 12, 203, 153, Tue Jan 25 00:00:00 CET 2022]
[u'K01', 2, 12, 200, 150, Tue Jan 25 00:00:00 CET 2022]
[u'K02', 2, 12, 208, 155, Tue Jan 25 00:00:00 CET 2022]
[u'K03', 2, 12, 209, 157, Tue Jan 25 00:00:00 CET 2022]

If you want a script to print datasets like @JordanCClark does, I have this in my library:

def to_string(ds):
	headers = ds.getColumnNames()
	widths = [len(str(ds.getRowCount()))] + [
		max(
			max(len(str(value)) for value in ds.getColumnAsList(c)),
			len(headers[c])
		) for c in range(ds.getColumnCount())
	]
	headers = [""] + headers
	ds = system.dataset.toPyDataSet(ds)
	return (
		" | ".join(h.ljust(w) for h, w in zip(headers, widths)) + '\n'
		+ "-" * (sum(widths) + 3*(len(widths)-1)) + '\n'
		+ '\n'.join(" | ".join(str(v).ljust(w) for v, w in zip([i] + [r for r in row], widths)) for i, row in enumerate(ds))
	)

def print_ds(ds):
	print to_string(ds)

It's not pretty, but it can be useful.
note that it print datasets, not pydatasets.

edit: I'm a bit ashamed of showing this, when I spend half my time badgering people about readability and clean code :X

1 Like

If you're wondering about what I use...

1 Like

I was getting an error of keyerror:‘12’ when I tried to implements this.
Well I needed to have 16 values instead of 12 in my append for my query.
Query had 16 columns, not 12.

Other than that, it went great.

params={"machine":-1,"materialPar":-1}
queryData = system.dataset.toPyDataSet(system.db.runNamedQuery("MainFolder/SecondFolder/MyQuery",params))
value=queryData
datasetIn = system.dataset.toPyDataSet(value)

These four lines helped me test with my query.


I should use one of those functions for printings datasets. Thanks again.
I don’t know if I will do that today.


Oh no, someone made gross a string in a table.
This is terrible.

I used int() to make the string an int for summing.
Then I used str() to append a string again.