HI,
part of my application I make a SELECT
query from a datawarehouse table that i have read only access. I display the data in a power table and this query runs every 2 hours.
I then want to take this data and write to my own tables so I can trend and analyse.
I can INSERT
to my table using pa.db.insertRows("tableName",mpds)
but the down side to this is if my data is not unique I am duplicating rows.
So my plan is to shift my SELECT
query result which I store in a memory tag of type dataset in to a seperate memory tag and then do a comparison of the two datasets and remove duplicate rows. I tried to use mutablePyDataSet.removeDuplicateRows()
but some cells would change between the two queries leaving me with duplicate rows.
So I want to use the use one column that holds unique data like a PRIMARY KEY
to identify duplicates, by building a list of the case_ids and then comparing the rows in the dataSet to see if that case_id exists but i am struggling to get it wokring
#grab datasets
lastSelect = system.tag.getTagValue("FC Metrics/selectFromBooker")
previousSelect = system.tag.getTagValue("FC Metrics/oldFromBooker")
#convert to mutablePyDatasets
mpdsLastSelect = pa.dataset.toData(lastSelect) #latest select query
mpdsPreviousSelect = pa.dataset.toData(previousSelect) #previous select query
#build list of case ids from previuos select
listCI = []
for row in mpdsPreviousSelect:
row = row["case_id"]
listCI.append(row)
#compare the list of case id's to the last select and remove duplicate rows
for row in mpdsLastSelect:
for case in listCI:
if case in mpdsLastSelect:
mpdsLastSelect.removeRow(row)