Combining Report Data Sources

cole.young34 · September 6, 2023, 6:25pm

Hi Everyone
I am trying to combine two Report Data Sources into one. I found this post and have it working for the most part. Here is my code:

def updateData(data, sample):
	#data['myKey']='Example'
	
	levelDataset = system.dataset.toPyDataSet (data['minmaxLevel'].getCoreResults())
	flowDataset  = system.dataset.toPyDataSet(data['minmaxFlow'].getCoreResults())
	
	headers = ['t_stamp', 'Level', 'Flow']
	dataOut = []
	for levelRow, flowRow in zip(levelDataset, flowDataset):
		dataOut.append([levelRow[0],levelRow[1], flowRow[1]])
	
	data['combinedData'] = system.dataset.toDataSet(headers, dataOut)

For some reason that code only returns data to August 22. Both sources are using the same StartDate and EndDate.

StartDate :

midnight(getDate(2023,7,1))

EndDate:

midnight(addDays(now(), -1))

What am I missing? I have studied the code and spent a lot of time going down the Google Rabbit hole trying to figure out why it is only showing data to 8/22. I tried it in both the CrossTab Component and the Table Component.

The data sources that I am using are Min/Max Aggregation Mode, which returns two values (as opposed to one in the code posted by @JordanCClark in the post linked above). How can I modify this so that I have both the Mix/Max for each one.

Thanks in advance.

JordanCClark · September 6, 2023, 6:43pm

Samples of your input data and a sample of the expected output would help. In your posted code, you only have three headers. where I'm assuming you will want five.

cole.young34 · September 7, 2023, 5:45pm

Hi @JordanCClark

I thought about this some more and changed some stuff, but I am still having some problems.

The input data consists of two tag history queries.

The first one contains two alias keys: level and flow. It is set to an Aggregation Mode Min/Max with a sample size of 24hr Interval. I was originally thinking that I was going to have to grab the min value separate from the max value, but it appears that I can have just one header and use the built in key calculations to access the min/max values.

The second one contains a single alias key: flow total. It is set to an Aggregation Mode Closest Value with a sample size of 24hr.

Both are using the same StartDate and End Date (shown in first post).

The output I would like to achieve is a table that shows a timestamp, total volume, min flow, max flow, min level, max level.

Here is what my script that combines the two data sets:

def updateData(data, sample):
	#data['myKey']='Example'
	
	minMaxDataset = system.dataset.toPyDataSet (data['minMax'].getCoreResults()) 
	totalDataset  = system.dataset.toPyDataSet(data['total'].getCoreResults())   
	
	headers = ['t_stamp', 'Level', 'Flow', 'Total']
	dataOut = []
	for minMaxRow, totalRow in zip(minMaxDataset, totalDataset):
		dataOut.append([minMaxRow[0],minMaxRow[1],minMaxRow[2], totalRow[1]])
	
	data['combinedData'] = system.dataset.toDataSet(headers, dataOut)

Looking at the preview tab, the rows only populate to Aug 19, 2023.

I hope this is enough info. I am relatively new to the whole scripting thing and not really sure all what info needs to be provided.

Thanks

Tim_Carlton · September 8, 2023, 12:35pm

What are you doing with the sample parameter?
Are you sure that data['minMax'] is a dataset?
Are you sure that data['total'] is a dataset?
Does data['minMax'] have any rows after Aug 19, 2023?
Does data['total'] have any rows after Aug 19, 2023?
What is the length (row co90unt) of the data['minMax'] dataset?
What is the length (row count) of the data['total'] dataset?
Do the two datasets have the same number of rows?
Do the dates match on each row in the two datasets?

You've been looking at the script for a while. It might be time to do a sanity check on the source data. If you want help from the forum it would be useful to share the source data.

cole.young34 · September 8, 2023, 1:11pm

Hi Tim

Pretty sure I found the problem. The row count from one to the other is different.

The minMax data set is a Min/Max aggregation which returns two values per date.
The total is a Closest Value aggregation which only returns one.

I finally realized this once I threw each data source in its own table. See attached.

Doing some quick searching, I haven't been able to find any samples that show how to combine data sets that have different amount of rows.

pturmel · September 8, 2023, 1:15pm

That would be a join. Which is really tricky for timestamps since IA's historian uses millisecond precision. I'd probably combine these with a script data source. Or perhaps you could use one of my join expression functions (from Simulation Aids V2).

j.israelsen · September 8, 2023, 2:58pm

I'm joining this conversation late so forgive me if I don't understand correctly. One idea for combining data like this (beyond joining through SQL), is that you could initialize an empty dataset for your full timespan and then fill in your data based on where it matches. For example:

Initialize a dataset to handle up to 15 days (which is the difference between your max and min dates)
For filling your data in, check the difference (for example day difference) from the beginning date to your data date to help place your data into the correct day (row).

JordanCClark · September 9, 2023, 10:39am

Not sure if this one is quite ready for primetime. I started this a few months ago, and just got around to having something workable when there are duplacte column names.

def combine(dataList, commonCol = 't_stamp'):
	from collections import OrderedDict, Counter
	''' Combine multiple datasets based on a common column
		dataList: list of datasets
		commonCol: column name common to all datasets. Default is 't_stamp'	    
	'''
	# Convert all datsets to BasicDataset, if needed
	for i, data in enumerate(dataList):
		if 'com.inductiveautomation.ignition.common.BasicDataset' not in str(type(data)):
			dataList[i] = system.dataset.toDataSet(data)
		
	# Create default value dictionary containing all column names
	# with None values
	blankValueDict = OrderedDict()
	
	# List to use with Counter to use fur a suffix, if needed
	counterList = []
	# Empty list of dictioaries to map incoming column names to new names in case of duplicates
	headerLookups = []

	# Iterate through the dataset headers, looking for duplicate column names
	# and create a dict representing a blank row of values of the new dataset.	
	for data in dataList:
		colNames = list(data.getColumnNames())
		counterList.extend(colNames)
		# count all the columns to find duplicates
		counter = Counter(counterList)
		# Dict to create map from columnName to new name
		colDict = {}
		for col in colNames:
			if col != commonCol:
				if counter[col] > 1:
					# Duplicate column format string
					formatString = '{}_{}'
				else:
					formatString = '{}'
				
				#Add mapping to dicts
				blankValueDict[formatString.format(col, counter[col]-1)] = None
				colDict[col] = formatString.format(col, counter[col]-1)
		headerLookups.append(colDict)	

	# Process the data
	dataDict = OrderedDict()
	for data, headerLookup in zip(dataList, headerLookups):
		colNames = list(data.getColumnNames())
		duplicateCols = []
		for i in xrange(data.rowCount):
			commonColValue = data.getValueAt(i, commonCol)
			if commonColValue not in dataDict.keys():
				dataDict[commonColValue] = blankValueDict.copy()
			for col in colNames:
				if col != commonCol:
					dataDict[commonColValue][headerLookup[col]] = data.getValueAt(i, col)
	
	# Create combined dataset
	headers = [commonCol] + blankValueDict.keys()
	data = []
	# Sort by the common key
	for key in sorted(dataDict.keys()):
		newRow=[]
		newRow.append(key)
		for col in headers[1:]:
			newRow.append(dataDict[key][col])
		data.append(newRow)
		
	return system.dataset.toDataSet(headers, data)

Usage example

##########################################
# Sample datasets
h1 = ['t_stamp', 'Col1', 'Col2']
d1 = [[1, 2, 3],
      [4, 5, 6],
      [7, 8, 9]]
ds1 = system.dataset.toDataSet(h1, d1)

h2 = ['t_stamp', 'Col1', 'Col99']
d2 = [[1, 11, 12],
      [3, 13, 14],
      [5, 15, 16],
      [7, 17, 18]]
ds2 = system.dataset.toDataSet(h2, d2)

##########################################

dsOut = util.dataset.combine([ds1, ds2])
util.dataset.printDataSet(dsOut)

print '\n'

dsOut = util.dataset.combine([ds1, ds2], 'Col1')
util.dataset.printDataSet(dsOut)

Output

row | t_stamp | Col1 | Col2 | Col1_1 | Col99
--------------------------------------------
0   | 1       | 2    | 3    | 11     | 12   
1   | 3       | None | None | 13     | 14   
2   | 4       | 5    | 6    | None   | None 
3   | 5       | None | None | 15     | 16   
4   | 7       | 8    | 9    | 17     | 18   


row | Col1 | t_stamp | Col2 | t_stamp_1 | Col99
-----------------------------------------------
0   | 2    | 1       | 3    | None      | None 
1   | 5    | 4       | 6    | None      | None 
2   | 8    | 7       | 9    | None      | None 
3   | 11   | None    | None | 1         | 12   
4   | 13   | None    | None | 3         | 14   
5   | 15   | None    | None | 5         | 16   
6   | 17   | None    | None | 7         | 18