Help inserting rows of totals to a dataset from a query

zacharyw.larson · September 26, 2022, 7:46pm

How can I merge columns part way through a dataset with an append at the end of each shift in a dataset?

Should I make a new post for this one?
I know how to create a row using append.
I am not sure how create a row that has a different number of columns though.

Input:

|  Line | Shift  | Hours | Gross   | Net  | Timestamp           |
| K01   | 1      | 12    |   200   | 150  | 2022-09-25 07:00:00 |
| K02   | 1      | 12    |   202   | 152  | 2022-09-25 07:00:00 |
| K03   | 1      | 12    |   203   | 153  | 2022-09-25 07:00:00 |
| K01   | 1      | 12    |   200   | 150  | 2022-09-25 15:00:00 |
| K02   | 1      | 12    |   202   | 152  | 2022-09-25 15:00:00 |
| K03   | 1      | 12    |   203   | 153  | 2022-09-25 15:00:00 |

Output

|  Line | Shift  | Hours | Gross   | Net  | Timestamp           |
| K01   | 1      | 12    |   200   | 150  | 2022-09-25 07:00:00 |
| K02   | 1      | 12    |   202   | 152  | 2022-09-25 07:00:00 |
| K03   | 1      | 12    |   203   | 153  | 2022-09-25 07:00:00 |
| Shift 1 totals here    |   605   | 455  | 2022-09-25 07:00:00 |
| K01   | 1      | 12    |   300   | 150  | 2022-09-25 15:00:00 |
| K02   | 1      | 12    |   302   | 152  | 2022-09-25 15:00:00 |
| K03   | 1      | 12    |   303   | 153  | 2022-09-25 15:00:00 |
| Shift 2 totals here    |   905   | 455  | 2022-09-25 07:00:00 |

or

output 2

|  Line | Shift  | Hours | Gross   | Net  | Timestamp           |
| K01   | 1      | 12    |   200   | 150  | 2022-09-25 07:00:00 |
| K02   | 1      | 12    |   202   | 152  | 2022-09-25 07:00:00 |
| K03   | 1      | 12    |   203   | 153  | 2022-09-25 07:00:00 |
| Shift 1 totals here    |   605   | 455                        | 
| K01   | 1      | 12    |   300   | 150  | 2022-09-25 15:00:00 |
| K02   | 1      | 12    |   302   | 152  | 2022-09-25 15:00:00 |
| K03   | 1      | 12    |   303   | 153  | 2022-09-25 15:00:00 |
| Shift 2 totals here    |   905   | 455                        |

My closest attempt, I thought I might find a way to get an index of the row where the shift become 2.
I haven’t figured out how to get that index of that row where the shift is 2 though.
Probably some way to look for an index where a column name is something, and index will take the first one I thought. Here is my most recent try including generating a list and generating a single total:

### Sample Dataset to simulate query result
import random
headers = ['date', 'shift', 'line', 'hours', 'gross', 'net']
data = []
t_stamp = system.date.now()
for i in range(10):
	t_stamp = system.date.addMinutes(t_stamp, 1)
	date=t_stamp
	if i<4:
	 shift=1
	else:
	 shift=2
	lineCol=('b' +str(random.randrange(5)))
	hoursCol=random.randrange(12)
	gross= random.randrange(100)
	net=random.randrange(100)
	data.append([date,shift, lineCol,hoursCol,gross,net])
dataIn = system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))
#for row in dataIn:
# for value in row:
#  print value

for row in dataIn:
	print [row[colName] for colName in dataIn.getColumnNames()]
print '---'

dataOut=[]
#for row in range(dataIn.getRowCount()):
test=sum( dataIn.getValueAt(row,dataIn.getColumnIndex('net')) for row in range(dataIn.getRowCount()))

print test
def handleDivideZero(a,b):
		return 0 if (a and b)==0 else a/b 
print data
data.append([t_stamp,0,'      '+' testing this',0,0,test])
print data
#dataOut= system system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))


index = (row.getValueAt('shift') for row in dataIn)
print index

PGriffith · September 26, 2022, 7:53pm

Well, to start - you can’t create a dataset with a different number of columns per row.
So, if you want to stick to the dataset format, you have to pick a value to put in the cells you don’t care about, such as null/None. Or, you can change your data structure from a dataset to something else, depending on where you’re actually using this.

Also - what’s the determination of when to aggregate? Will all rows have exactly the same timestamp, down to the millisecond?

dkhayes117 · September 26, 2022, 7:55pm

Do you have the reporting module? This would be a good task for table grouping.

zacharyw.larson · September 26, 2022, 7:55pm

@PGriffith
Thanks, I will set the output columns to the same number then, and work on the appearance.
Yes, in my query data coming in, the timestamps for each shift will have an exact timestamp match.
(It is funny I messed that up in my test data. I will fix it.)

@dkhayes117 I have the reporting module, but I am trying to work on a table from a query.
I think I see what you mean. I might be able to get the totals at the end, or make sub query groups that I then do totals on. Right now though, I want to make a query and script the totals. Later I might have to work on the way the reporting module will do it so that I can help people print on paper.

I am trying to receive a query binding for a table, and output to the table.

In my testing, I have different timestamps, but yah, in my query, the timestamps will probably be the same.
I am trying to get the index of the first row on the 2nd shift.
I want to insert a total row after the rows grouped by the shift from my query.
(I could use unions in SQL, but scripting totals after is faster it seems.)

Things I can and can’t know when I receive my data:
I never know how many rows any shift will have.
I know I will have rows for 1st, then 2nd, then 3rd.

index = dataIn.index(  dataIn.getRowIndex()  for row in range(dataIn.getRowCount() if dataIn.getValueAt(row,dataIn.getColumnIndex('shift'))=2)
print index

Trying to get an index of the first row to contain an entry for the ‘shift’ column that indicates it is 2nd shift. I used a 2 for this when I created the table for testing.

Getting a syntax error of no alternative at input ‘if’

Also, I am not sure how to breakout of my list comprehension when I do found the first row.

zacharyw.larson · September 26, 2022, 8:09pm

Couldn’t get dataIn.getRowAsList()?

https://docs.inductiveautomation.com/display/DOC81/Datasets

test =  (dataIn[row] for row in range(dataIn.getRowCount()) if dataIn[row][1]==2)

for thing in test:
 for value in thing:
  print value

Might be getting closer.
How do I breakout of a list comprehension at the first row when the if is true?
Or is there a way to slice the first row?

dkhayes117 · September 26, 2022, 8:36pm

Although I don't know your use case in-depth, I think this will be much more manageable with a report since this seems to be reporting type data that you are trying to display. Remember, there is a report viewer available for both Vision and Perspective where the reports can be interactive.

Having subtotals for grouped data is trivial in a report, where scripting will just add more overhead to maintain. Just my 2 cents.

zacharyw.larson · September 26, 2022, 8:42pm

I want to do both ways.

Will the script overhead be very low? I am expecting the overhead to be low.
It will run once when the page opens.
I think you are right that reporting probably handles this nicely in a grouping.
I just don’t like always how the report looks, and it is not consistent with my other pages.
I think I have to make progress tomorrow morning, so I might do the report side first.

I am going to return to this tomorrow.

I think it will help me with my page, but it is costing me a bit of time.
The request was pretty specific so I think it will be worth it.
It will be a bit nice to figure out.
I think/hope @JordanCClark might appreciate this as a brain teaser.
This is sort of how far or close I got thus far:

### Sample Dataset to simulate query result
import random
headers = ['date', 'shift', 'line', 'hours', 'gross', 'net']
data = []
t_stamp = system.date.now()
for i in range(10):
	if i<4:
	 date=t_stamp
	else:
	 date = system.date.addHours(t_stamp, 8)
	if i<4:
	 shift=1
	else:
	 shift=2
	lineCol=('b' +str(random.randrange(5)))
	hoursCol=random.randrange(12)
	gross= random.randrange(100)
	net=random.randrange(100)
	data.append([date,shift, lineCol,hoursCol,gross,net])
dataIn = system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))
#for row in dataIn:
# for value in row:
#  print value

for row in dataIn:
	print [row[colName] for colName in dataIn.getColumnNames()]
print '---'

dataOut=[]
#for row in range(dataIn.getRowCount()):
test=sum( dataIn.getValueAt(row,dataIn.getColumnIndex('net')) for row in range(dataIn.getRowCount()))

print test
def handleDivideZero(a,b):
		return 0 if (a and b)==0 else a/b 
print data
#data.append([t_stamp,0,'      '+' testing this',0,0,test])
print data
#dataOut= system system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))

#index= dataIn[1][1]
test =  (dataIn[row] for row in range(dataIn.getRowCount()) if dataIn[row][1]==2)

for thing in test:
 for value in thing:
  print value

I think if I can slice that last set, then I might be able to use .index to find my inserts for shift 1 and shift 2.
I think I can use .append for shift 3.
Maybe I should just make a dataset of each shift one at a time, then append my totals after each one.
Then append all of datasets at the end.
Maybe I will do that tomorrow.

dkhayes117 · September 26, 2022, 8:47pm

The script shouldn't be a problem, I just meant maintenance overhead.

zacharyw.larson · September 26, 2022, 8:48pm

Will you unpack that for me some?

I don’t know what you mean.

dkhayes117 · September 26, 2022, 8:55pm

Sometimes data changes, maybe one of the column names changes in the future, or another column of interest is added 12 months down the road, or some sort of edge case pops up. When you come back to add/fix that, now you have a script you have to decipher. Where a report may just need a key, since it handles the (sub)totalizing internally. Less code (scripting in this case) is best code.

prashants · September 26, 2022, 9:08pm

You may use SQL query to find the first value in that dataset where shift = 1 or 2.
SELECT TOP 1 * FROM Dataset WHERE Shift = 1;

JordanCClark · September 27, 2022, 1:43am

Assuming your query will return something with ORDER BY Timestamp, Shift, Line to give data as in your original post. I would even go so far as to use just the date portion of Timestamp. Then, you are just looking for changes in either the date or the shift.

sampleHeaders = ['Line', 'Shift', 'Hours', 'Gross', 'Net', 'Timestamp']
sampleData = [['K01', 1, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 1, 12, 202, 152, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 1, 12, 203, 153, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K01', 2, 12, 200, 150, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K02', 2, 12, 208, 155, system.date.parse('2022-09-25', 'yyyy-MM-DD')],
              ['K03', 2, 12, 209, 157, system.date.parse('2022-09-25', 'yyyy-MM-DD')]
             ]

datasetIn = system.dataset.toPyDataSet(system.dataset.toDataSet(sampleHeaders, sampleData))

#util.printDataSet(datasetIn)
#print ''

headers = list(datasetIn.getColumnNames())
dataOut = []

grossSum = netSum = 0
for row, nextRow in zip(datasetIn, datasetIn[1:]):
	# Add values for the row
	grossSum += row['Gross']
	netSum += row['Net']
	# Append the row into the output data
	dataOut.append(list(row))
	# Compare the timestamp and shift values to the next row.
	# If they're different, append the total row to the output data.
	if row['Timestamp'] != nextRow['Timestamp'] or row['Shift'] != nextRow['Shift']:
		dataOut.append(['Total', None, None, grossSum, netSum, None]) 
		grossSum=0
		netSum = 0
# Process the final row
grossSum += nextRow['Gross']
netSum += nextRow['Net']
dataOut.append(list(nextRow))
dataOut.append(['Total', None, None, grossSum, netSum, None]) 

# Create the dataset
datasetOut = system.dataset.toDataSet(headers, dataOut)

#util.printDataSet(datasetOut)

Output:

row | Line | Shift | Hours | Gross | Net | Timestamp                   
-----------------------------------------------------------------------
0   | K01  | 1     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
1   | K02  | 1     | 12    | 202   | 152 | Tue Jan 25 00:00:00 EST 2022
2   | K03  | 1     | 12    | 203   | 153 | Tue Jan 25 00:00:00 EST 2022
3   | K01  | 2     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
4   | K02  | 2     | 12    | 208   | 155 | Tue Jan 25 00:00:00 EST 2022
5   | K03  | 2     | 12    | 209   | 157 | Tue Jan 25 00:00:00 EST 2022

row | Line  | Shift | Hours | Gross | Net | Timestamp                   
------------------------------------------------------------------------
0   | K01   | 1     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
1   | K02   | 1     | 12    | 202   | 152 | Tue Jan 25 00:00:00 EST 2022
2   | K03   | 1     | 12    | 203   | 153 | Tue Jan 25 00:00:00 EST 2022
3   | Total | None  | None  | 605   | 455 | None                        
4   | K01   | 2     | 12    | 200   | 150 | Tue Jan 25 00:00:00 EST 2022
5   | K02   | 2     | 12    | 208   | 155 | Tue Jan 25 00:00:00 EST 2022
6   | K03   | 2     | 12    | 209   | 157 | Tue Jan 25 00:00:00 EST 2022
7   | Total | None  | None  | 617   | 462 | None

zacharyw.larson · September 27, 2022, 1:46pm

Wow @JordanCClark

for row, nextRow in zip(datasetIn, datasetIn[1:]):

Thanks so much. This makes sense. I didn’t realize I could do that.

I will test if these are doing the same:

grosSum= sum(row['Gross']) and grossSum += row['Gross']
update: One is used in the for loop
The other is used better as a list comprehension and not as I wrote it here, and I was confused.

I will test if these are doing the same:
dataOut.append(list(row)) and dataOut.append([row])
This one confuses me a little more.

dataOut.append(list(row))
dataOut2.append([row])

I put them in sequence. Then I tried to print them at the end.
Seemed to add a another nest.
Appending list(row) adds a list of values.
Appending ([row]) adds each value as it’s own list.
I don’t fully understand that.

You have a section for processing the final row.
I am trying to print the list of row and next row and next row to learn, but getting errors.
test = [row, nextRow for (x,y) in zip(datasetIn, datasetIn[1:])]
I think it is because they are not the same length.

However, I can print with:

for row, nextRow in zip(datasetIn, datasetIn[1:]):
	for value in row:
	 print value
	for value in nextRow:
	 print value

I knew that the nextRow would be a shorter by one row.

~~When you do a for loop with a zip of two lists of unequal lengths, the loop just ends when one of the lists is shorter?~~

W3schools on zip(): If the passed iterators have different lengths, the iterator with the least items decides the length of the new iterator.

~~I can’t make a list comprehension of zipping two unequal length lists though?~~
This worked:

test = ((a,b) for (a,b) in zip(datasetIn, datasetIn[1:]))

Trying to understand it better.

~~So now, I don’t understand how processing the final worked.~~

~~grossSum += nextRow['Gross']~~
~~nextRow should contain 1st and 2nd shift data. When I print, I see that it is adding up only the 2nd shift data, but I don't see how.~~

~~I tried:~~

~~print grossSum for value in nextRow: print value~~
~~Output was this:~~ ~~``` 617 K03 2 12 209 157 Tue Jan 25 00:00:00 EST 2022 >>> ```~~ ~~nextRow contained only the last row, but why didn't it print ?~~

Oh, grossSum already added up the previous values for shift2, and it just needed one more.
The last value in nextRow was the final row from dataIn.
I see now.
Thanks so much.

pascal.fragnoud · September 27, 2022, 2:22pm

Append and lists:

Appending obj to a list will append it directly. Will means, if it's a list, you'll have a list inside your list.
Now, the difference between append(list(row)) and append([row]) is simple. In your case, row is already a list. Maybe not a python built-in list, some kind of wrapper, but a list nonetheless.
So, when you append list(row), you're casting the list into a list, then appending it.
When appending [row], you're making a list that contains row, and appending it.
just try this in your console:

print list(row)
print [row]

and you'll get the same difference.

If you want to append the values of a list to another list, you can use .extend([1, 2, 3]). This will append 1, then append 2, then append 3.

That's because... you're not using the right names. You're unpacking the zip values into (x, y), then trying to print row and nextRow.

If you want to zip two sequences of unequal length and not have it stop when the shortest one is finished, you can use izip_longest from itertools:

from itertools import izip_longest as lzip

foo = ['a', 'b', 'c']
bar = [1, 2, 3, 4]

list(lzip(foo, bar))
# > [('a', 1), ('b', 2), ('c', 3), (None, 4)]
list(lzip(foo, bar, fillvalue="boo"))
# > [('a', 1), ('b', 2), ('c', 3), ('boo', 4)]

If padding with None, you can also use map(None, iter1, iter2...):


foo = ['a', 'b', 'c']
bar = [1, 2, 3, 4]

map(None, foo, bar)
# > [('a', 1), ('b', 2), ('c', 3), (None, 4)]

zacharyw.larson · September 27, 2022, 2:30pm

I didn’t get that last part.

foo = ['a', 'b', 'c']
bar = [1, 2, 3, 4]
mapTest=("b00",foo,bar)
print mapTest

got me:

('b00', ['a', 'b', 'c'], [1, 2, 3, 4])

ah I forgot the map.

[('a', 1), ('b', 2), ('c', 3), (None, 4)]

What was that way to print pythondatasets similar to:

print (list(map(list,datasetIn)))

Except with new lines for each list in the lists?
@pascal.fragnoud ?

pascal.fragnoud · September 27, 2022, 2:43pm

Frankly I’d just do

for row in pydataset:
    row
# or print repr(row)

if you really want a one liner (or have a single print, which is faster)

print '\n'.join(repr(row) for row in data)

or, if you really want the map:

print '\n'.join(map(repr, data))

justinedwards.jle · September 27, 2022, 2:47pm

I just wanna say... I've become a huge fan of your simplifications

justinedwards.jle · September 27, 2022, 2:48pm

Keep up the good work!

zacharyw.larson · September 27, 2022, 2:49pm

print '\n'.join(map(repr, datasetIn))

<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>
<Row:6 columns>

It is close.

pascal.fragnoud · September 27, 2022, 2:51pm

huh, what’s your datasetIn ? Is that a pyDataSet ?