How to query data from two database (PeopleSoft and Microsoft SQL) into one power table

jdenchev · June 20, 2019, 9:13am

pturmel · June 20, 2019, 10:43am

You can’t do that from the Ignition side. JDBC simply doesn’t support such operations. You have to query each one separately and use a script to match and combine rows from the two sources into a final output dataset. When emulating a JOIN in a script, it is simplest to put the rows of the smaller source dataset into a python dictionary, keyed on the value or tuple of values needed for the ON condition. Then extract what you need as you iterate through the rows of the larger source dataset.

hachjessed · October 30, 2019, 2:07pm

Would you have an example script of the datasets-to-dictionarys-to-dataset operation? I have made some attempts, not sure if I’m headed in the right direction or not.

	value = system.tag.read("[default]Yard Program/WHS Date").value
	pyvalues = system.dataset.toPyDataSet(value)
	values = []
	for i in range(pyvalues.getRowCount()):
		values.append(pyvalues.getValueAt(i,'YPSerial'))
		values.append(pyvalues.getValueAt(i,'YPWHSDate'))

	result = system.db.runPrepQuery(query, sn + sn1 + parts + models, 'NX_POI')

	dict1 = {}
	for row in result:
		dict1[row[0]] = row[1], row[2], row[3]
	dict2 = {}
	for row in pyvalues:
		dict2[row[0]] = row[1]
	d3 = dict((k, [v] + [dict2[k]] if k in dict2 else []) for (k, v) in dict1.items())
	combineList = list(dict.items(d3))

Attempting to get to a dataset that I can use to populate a power table component.

JordanCClark · October 30, 2019, 2:30pm

Can you give a small example (3-5 rows each) of the datasets coming in, and what you expect the output dataset to look like?

hachjessed · October 30, 2019, 3:11pm

I am attempting to join the Model, Sequence, Highlight and WHSDate columns ON the SN column. Dataset2 list all SN’s available and Dataset1’s SN’s are the desired SN’s to populate the Output Dataset. It would essentially be a JOIN statement if it were a SQL query.

Dataset1(from SQL Query):

row | SN Model Sequence Highlight          
-----------------------------
 0  | 1N04060REK0207198	4060R	149,519	0       
 1  | 1N04060RCK0207199	4060R	149,523	0        
 2  | 1N04060RAK0207200	4060R	149,539	0

Dataset2(from Memory Tag):

row | SN WHSDate          
-----------------------------
 0  | 1N04060REK0207198	Mon Oct 14 00:00:00 CDT 2019
 1  | 1N04060RCK0207199	Fri Oct 04 00:00:00 CDT 2019
 2  | 1N04060RAK0207200	Fri Oct 11 00:00:00 CDT 2019
 3  | 1N04060RCK0207212	Tue Sep 03 00:00:00 CDT 2019
 4  | 1N04038RKK0204150 Wed Oct 09 00:00:00 CDT 2019

Desired Output Dataset:

row | SN Model Sequence Highlight WHSDate      
--------------------------------------------------------------
 0  | 1N04060REK0207198	4060R	149,519	0 Mon Oct 14 00:00:00 CDT 2019
 1  | 1N04060RCK0207199	4060R	149,523	0 Fri Oct 04 00:00:00 CDT 2019       
 2  | 1N04060RAK0207200	4060R	149,539	0 Fri Oct 11 00:00:00 CDT 2019

JordanCClark · October 30, 2019, 6:31pm

Gotcha. The trick is to put everything in one dictionary. You can use nested dictionaries with the SNs as the top level key.

# Create our sample datasets. In this, I used the dates as strings for display purposes. It shouldn't make any difference for however you get the datasets in. 

headers1 = 	['SN','Model','Sequence','Highlight'] 
data1 = 	[['1N04060REK0207198','4060R','149,519',0],
			 ['1N04060RCK0207199','4060R','149,523',0],
			 ['1N04060RAK0207200','4060R','149,539',0]
			] 

headers2 =	['SN', 'WHSDate']
data2 =		[['1N04060REK0207198',	'Mon Oct 14 00:00:00 CDT 2019'], 
			 ['1N04060RCK0207199',	'Fri Oct 04 00:00:00 CDT 2019'],
			 ['1N04060RAK0207200',	'Fri Oct 11 00:00:00 CDT 2019'],
			 ['1N04060RCK0207212',	'Tue Sep 03 00:00:00 CDT 2019'],
			 ['1N04038RKK0204150',	'Wed Oct 09 00:00:00 CDT 2019']
			]

dataset1 = system.dataset.toPyDataSet(system.dataset.toDataSet(headers1, data1))
dataset2 = system.dataset.toPyDataSet(system.dataset.toDataSet(headers2, data2))

#################################################################################

# Get columnNames from the underlying datasets
columnNames1 = dataset1.getUnderlyingDataset().getColumnNames()
columnNames2 = dataset2.getUnderlyingDataset().getColumnNames()


# Create dictionary. Since the SN is common between the two dataset, we can use it
# as a key name, with the rest of the data as a nested dictionary.
dictOut = {}

for row in dataset1:
	for keyName, col in zip(columnNames1, row):
		if keyName == 'SN':
			dictOut[col] = {}
		if keyName != 'SN':
			dictOut[row['SN']][keyName] = col
			
# Add in the second dataset. If the SN of the row matches one on the SN keys
# in the dictionary, then add the remaining data.			
for row in dataset2:
	if row['SN'] in dictOut.keys():
		for keyName, col in zip(columnNames2, row):
			if keyName != 'SN':
				dictOut[row['SN']][keyName] = col

# The dictionary now looks like this:
#	{
#	u'1N04060RCK0207199': {u'Highlight': 0, u'Sequence': u'149,523', u'Model': u'4060R', u'WHSDate': u'Fri Oct 04 00:00:00 CDT 2019'},
#	u'1N04060REK0207198': {u'Highlight': 0, u'Sequence': u'149,519', u'Model': u'4060R', u'WHSDate': u'Mon Oct 14 00:00:00 CDT 2019'}, 
#	u'1N04060RAK0207200': {u'Highlight': 0, u'Sequence': u'149,539', u'Model': u'4060R', u'WHSDate': u'Fri Oct 11 00:00:00 CDT 2019'}
#	}

# Create a dataset out of this new dictionary:

# Merge the two sets of column names
newHeaders = columnNames1 + [colName for colName in columnNames2 if colName not in columnNames1]
dataOut = []

for SNKey in dictOut.keys():
	newRow = []
	for header in newHeaders:
		if header == 'SN':
			newRow.append(SNKey)
		else:
			newRow.append(dictOut[SNKey][header])
	dataOut.append(newRow)
	
datasetOut = system.dataset.toDataSet(newHeaders, dataOut)

Results:

***  dataset1  ***
row | SN                Model Sequence Highlight
------------------------------------------------
 0  | 1N04060REK0207198 4060R 149,519  0        
 1  | 1N04060RCK0207199 4060R 149,523  0        
 2  | 1N04060RAK0207200 4060R 149,539  0        

***  dataset2  ***
row | SN                WHSDate                     
----------------------------------------------------
 0  | 1N04060REK0207198 Mon Oct 14 00:00:00 CDT 2019
 1  | 1N04060RCK0207199 Fri Oct 04 00:00:00 CDT 2019
 2  | 1N04060RAK0207200 Fri Oct 11 00:00:00 CDT 2019
 3  | 1N04060RCK0207212 Tue Sep 03 00:00:00 CDT 2019
 4  | 1N04038RKK0204150 Wed Oct 09 00:00:00 CDT 2019

*** datasetOut ***
row | SN                Model Sequence Highlight WHSDate                     
-----------------------------------------------------------------------------
 0  | 1N04060RCK0207199 4060R 149,523  0         Fri Oct 04 00:00:00 CDT 2019
 1  | 1N04060REK0207198 4060R 149,519  0         Mon Oct 14 00:00:00 CDT 2019
 2  | 1N04060RAK0207200 4060R 149,539  0         Fri Oct 11 00:00:00 CDT 2019

JordanCClark · October 30, 2019, 7:02pm

It just occurred to me that the actual serialization in your process is the last 7 characters of the serial number, and that you would likely want to sort it that way. It actually simplifies things a bit, as we can use that 7 characters as the top level key, instead.

columnNames1 = dataset1.getUnderlyingDataset().getColumnNames()
columnNames2 = dataset2.getUnderlyingDataset().getColumnNames()

dictOut = {}

for row in dataset1:
	rawSerial = row['SN'][-7:]
	dictOut[rawSerial] = {}
	for keyName, col in zip(columnNames1, row):
		dictOut[rawSerial][keyName] = col
		
for row in dataset2:
	rawSerial = row['SN'][-7:]
	if rawSerial in dictOut.keys():
		for keyName, col in zip(columnNames2, row):
			dictOut[rawSerial][keyName] = col

# This method gives us this output:						
#	{
#	u'0207200': {u'Highlight': 0, u'Sequence': u'149,539', u'Model': u'4060R', u'SN': u'1N04060RAK0207200', u'WHSDate': u'Fri Oct 11 00:00:00 CDT 2019'}, 
#	u'0207199': {u'Highlight': 0, u'Sequence': u'149,523', u'Model': u'4060R', u'SN': u'1N04060RCK0207199', u'WHSDate': u'Fri Oct 04 00:00:00 CDT 2019'}, 
#	u'0207198': {u'Highlight': 0, u'Sequence': u'149,519', u'Model': u'4060R', u'SN': u'1N04060REK0207198', u'WHSDate': u'Mon Oct 14 00:00:00 CDT 2019'}
#	}


newHeaders = columnNames1 + [colName for colName in columnNames2 if colName not in columnNames1]
dataOut = []

# Now we can build the dataset using the sorted serialized keys.
for rawKey in sorted(dictOut.keys()):
	newRow = [dictOut[rawKey][header] for header in newHeaders]
	dataOut.append(newRow)
	
datasetOut = system.dataset.toDataSet(headers, dataOut)

hachjessed · October 31, 2019, 2:08pm

Thanks Jordan!! This is perfect! Helps strengthen my ‘dataset’ skills. Also, great idea on using the last 7 characters of the SN! Appreciate it!

hachjessed · November 14, 2019, 9:03pm

One follow up question on the dataset combine scripting.
If I have serial numbers in dataset1 that are not present in dataset2 I get an error that I believe is stating there are not values to match up to that serial number. Would there be a way to make the WHSDate blank if the serial number does not exist in dataset2? Hopefully this makes sense.
Error recieved:

paul-griffith · November 14, 2019, 9:10pm

Might be as simple as adding

from collections import defaultdict

and then initializing dictOut as a defaultdict containing dict objects:

dictOut = defaultdict(dict)

A defaultdict is just like a dictionary, but if the key is missing it starts from a default value (in this case, a dictionary) - so the nested dictionary access will work without a KeyError.

JordanCClark · November 15, 2019, 12:26pm

EDIT: Scratch that last. I was looking at the wrong script.

JordanCClark · November 15, 2019, 1:05pm

This time for sure!

for row in dataset1:
	rawSerial = row['SN'][-7:]
	dictOut[rawSerial] = {}
	for keyName, col in zip(columnNames1, row):
		dictOut[rawSerial][keyName] = col
	# Add None values for columns in dataset2 not in dataset1 
	for col2 in columnNames2:
		if col2 not in dictOut[rawSerial].keys():
			dictOut[rawSerial][col2] = None

hachjessed · August 17, 2020, 7:56pm

Another nested dictionary question.
Looking to loop thru a pydataset and use a column value to match up to a dictionary key(top level) and update a nested key to match a value from the pydataset. Hopefully the below script will help explain. This script is applying the value from the first iteration to all dictionary values. Any idea why this doesn’t work?

queryResult

row | SN 	Date		Sequence     
-----------------------------
 0  | 1111	2020-08-01	1234
 1  | 2222	2020-08-02	2234        
 2  | 3333	2020-08-03	3334  

dict1 = {'1111': {'Date': None, 'Sequence': None},
         '2222': {'Date': None, 'Sequence': None},
         '3333': {'Date': None, 'Sequence': None}}


for x in queryResult:
	serial = x['SN']
	dict1[serial]['Date'] = x['Date']
	dict1[serial]['Sequence'] = x['Sequence']

Result

dict1 = {'1111': {'Date': 2020-08-03, 'Sequence': 3334},
         '2222': {'Date': 2020-08-03 'Sequence': 3334},
         '3333': {'Date': 2020-08-03, 'Sequence': 3334}}

Desired result:

dict1 = {'1111': {'Date': 2020-08-01, 'Sequence': 1234},
         '2222': {'Date': 2020-08-02, 'Sequence': 2234},
         '3333': {'Date': 2020-08-03, 'Sequence': 3334}}

Any help would be appreciated.

JordanCClark · August 17, 2020, 8:37pm

Try putting Date in quotes in your dictionary.

hachjessed · August 18, 2020, 3:37am

Typo on my part, the ‘Date’ key was in quotes.
After some further investigation what appears to be happening, is when using the change value method below on a nested dictionary it is changing all [key]'s to the set value. So when iterating over them, I am just seeing the value for the last iteration applied to all of the like inner keys. Is this how the change value function should work for a nested dictionary?

dict1[topKey][key] = value

JordanCClark · August 18, 2020, 9:28am

Your are correct sir.

How are you constructing your dictionary? Using your example above, this works:

headers = ['SN', 'Date', 'Sequence']
data = [['1111', '2020-08-01', 1234],
		['2222', '2020-08-02', 2234],
		['3333', '2020-08-03', 3334]]
		
queryResult = system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))

dict1 = {'1111': {'Date': None, 'Sequence': None},
         '2222': {'Date': None, 'Sequence': None},
         '3333': {'Date': None, 'Sequence': None}}

for x in queryResult:
	serial = x['SN']
	dict1[serial]['Date'] = x['Date']
	dict1[serial]['Sequence'] = x['Sequence']

print dict1

{'1111': {'Sequence': 1234, 'Date': u'2020-08-01'}, '2222': {'Sequence': 2234, 'Date': u'2020-08-02'}, '3333': {'Sequence': 3334, 'Date': u'2020-08-03'}}

This reproduces what you are seeing. In a nutshell, each dict1 entry points to the same subDict object:

headers = ['SN', 'Date', 'Sequence']
data = [['1111', '2020-08-01', 1234],
		['2222', '2020-08-02', 2234],
		['3333', '2020-08-03', 3334]]
		
queryResult = system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))

dict1 = {}
subDict = {'Date': None, 'Sequence': None}

for x in queryResult:
	serial = x['SN']
	# Each time we use subDict like this, it uses the same subDict object.
	# Look at it as like using an alias.
	dict1[serial] = subDict
	dict1[serial]['Date'] = x['Date']
	dict1[serial]['Sequence'] = x['Sequence']

print dict1

{u'1111': {'Sequence': 3334, 'Date': u'2020-08-03'}, u'2222': {'Sequence': 3334, 'Date': u'2020-08-03'}, u'3333': {'Sequence': 3334, 'Date': u'2020-08-03'}}

In order to use subDict, we need to use the .copy() method:

headers = ['SN', 'Date', 'Sequence']
data = [['1111', '2020-08-01', 1234],
		['2222', '2020-08-02', 2234],
		['3333', '2020-08-03', 3334]]
		
queryResult = system.dataset.toPyDataSet(system.dataset.toDataSet(headers, data))

dict1 = {}
subDict = {'Date': None, 'Sequence': None}

for x in queryResult:
	serial = x['SN']
	# Use copy() method
	dict1[serial] = subDict.copy()
	dict1[serial]['Date'] = x['Date']
	dict1[serial]['Sequence'] = x['Sequence']

print dict1

{u'1111': {'Sequence': 1234, 'Date': u'2020-08-01'}, u'2222': {'Sequence': 2234, 'Date': u'2020-08-02'}, u'3333': {'Sequence': 3334, 'Date': u'2020-08-03'}}

hachjessed · August 18, 2020, 12:48pm

If I construct the dictionary manually as you posted earlier I also am successful at changing the dictionary values by iterating thru the queryResult to reference the topKey of the dictionary.
It appears the issue is in the construction of the valueless dictionary. I am attempting to use the column names from multiple SQL queries to build the dictionary in the below manor.

columnNames1 = list(query1Result.getUnderlyingDataset().getColumnNames())
columnNames2 = list(query2Result.getUnderlyingDataset().getColumnNames())
columnNamesAll = list(dict.fromkeys(columnNames1 + columnNames2))
columnNamesAll = {columnNamesAll : None for columnNamesAll in columnNamesAll}
del columnNamesAll ['SN']

dict1 = {}

for row in query1Result:
	dict1[row['SN']] = columnNamesAll 
print dict1

{u'1111': {u'WarehouseDate': None, u'Sequence': None}, u'2222': {u'WarehouseDate': None, u'Sequence': None}, u'3333': { u'WarehouseDate': None, u'Sequence': None}}

Is there a better way to build this dictionary to avoid using the subDict.copy() method you described? Although that does work with the script I currently have.

JordanCClark · August 18, 2020, 6:27pm

copy() is actually the more efficient method for doing this, and really must be done this way, unless you construct an entirely new dictionary each time.

You can reduce the build of columnNamesAll, however:

columnNamesAll = dict.fromkeys([col for col in columnNames1 + columnNames2 if col != 'SN'], None)

JordanCClark · August 18, 2020, 6:42pm

Or, if you’re really in a one-liner sort of mood:

dict1 = {row['SN']:{col:None for col in columnNames1 + columnNames2 if col != 'SN'} for row in queryResult}

hachjessed · August 19, 2020, 11:42am

Thanks Jordan!