I am currently trying to take stored dataset tags I have and form a new dictionary for each dataset (column names would be keys, and values would be values, only one row per dataset). When I print the type after reading these tags, it says its a java.util.ArrayList. When I try to convert it to a pyDataset using system.dataset.toPyDataset. I get the following error:
TypeError: toPyDataSet(): 1st arg can't be coerced to com.inductiveautomation.ignition.common.Dataset
Is there any way I can convert my datasets to pyDatasets/is there a way I can form these dictionaries using java scripting from this datatype (no experience in Java).
I just want to be able be able to convert it into the following dictionary { "Flow Rate": 0.0028, "Pressure": 0.0057, "Temperature": 0.0067.
Keep in mind that I have this dataset stored in a tag, only shown on table for visualization and cant do scripting based off dataset shown on table. Any advise would be extremely appreciated, thanks!
You say that the tag is a Dataset tag, but what is it really? How are you getting the value from the tag? If I look at a dataset tag that I have in the tag browser, this is what I see.
This is what I can see from the tag. I have the table building as you progress through pressing buttons and there is another row added below it. If its possible, it would be helpful if I could take this dataset from the table with two rows and make each row a dictionary and then make a list of dicts from that. Not sure if that's possible though.
It is trivial with my Simulation Aids module, with an expression binding like so:
forEach(
{Coefficients/MLR},
asMap(it())
)
This will work perfectly in Perspective. The problem you will have in Vision is that Vision has no custom property type that will hold the result. This will be true if you use a script to produce this, too.
Can you show the script that you are using to get the value of the tag, that is producing the typeError?
Specifically how you are reading the tag, and the path that is used?.
This is possible, though I am still not convinced this information needs to be stored as a tag. It is a common misconception that new users of the platform make that everything needs to be a tag. It doesn't. A custom property is much more suited to this, if there is no reason for these datasets to be global to the system.
Needs to be, because the idea is to be able to generate multiple model types coefficients based on json data that is passed into it. Idea was to create a different tag per model type and hold to store.
system.tag.readBlocking() takes and returns a list . You should be getting your values like this:
tagPaths = ['[default]Coefficients/MLR','[defalut]CombinedData/MLR']
dataset1,dataset2 = [system.dataset.toPyDataSet(qv.value) for qv in system.tag.readBlocking(tagPaths)]
Then to convert a dataset to a list of dictionaries you would do something like:
toFlask = [{columnName:row[columnName] for row in dataset1} for columnName in dataset1.getColumnNames()]
I'd expect it not to matter much for most use cases.
Write for readability/simplicity first, then optimize if needed.
Though I guess if the operations are abstracted away, might as well use the most performant method... But as long as I haven't timed them, I won't even try to guess if there's a meaningful performance difference.
But since I'm not usually converting datasets in loops, I haven't felt the need to benchmark/profile it.
Worth testing. I suspect not. Pascal's helper is virtually identical to what I commonly use. (I use the names cols, rows, and row where he uses keys, data, and values.) When I'm not using my SmartMap dictionary subclass to make processing within a loop neater.
Looks like looping through a PyDataset is about twice as fast. For smaller datasets there is still a "significant" difference but they are both fast enough that no user could tell the difference.
Test Script
from java.lang.System import nanoTime
colsInData = 100
rowsInData = 10000
headers = ['col_{}'.format(i) for i in range(colsInData)]
data = [[i for i in range(colsInData)] for j in range(rowsInData)]
testData = system.dataset.toDataSet(headers,data)
def ds_to_dicts(ds):
cols = ds.columnNames
return [dict(zip(cols,[ds.getValueAt(rI,col) for col in cols])) for rI in range(ds.rowCount)]
def pyDs_to_dicts(ds):
cols = ds.columnNames
rows = system.dataset.toPyDataSet(ds)
return [dict(zip(cols,row)) for row in rows]
dsTimes = []
pyDsTimes = []
iterations = 100
for iteration in range(iterations):
startTime = nanoTime()
result = ds_to_dicts(testData)
endTime = nanoTime()
dsTimes.append(float(endTime - startTime) / 1000000)
startTime = nanoTime()
result = pyDs_to_dicts(testData)
endTime = nanoTime()
pyDsTimes.append(float(endTime-startTime) / 1000000)
print 'Total Iterations: ',iterations, testData
print 'Average Time to Loop through Dataset:', sum(dsTimes)/iterations, ' ms'
print 'Average Time to Loop through PyDataset:', sum(pyDsTimes)/iterations, ' ms'
Results:
>>>
Total Iterations: 100 Dataset [10000R x 100C]
Average Time to Loop through Dataset: 876.992074 ms
Average Time to Loop through PyDataset: 492.043886 ms
>>>
Total Iterations: 100 Dataset [10R x 100C]
Average Time to Loop through Dataset: 0.777953 ms
Average Time to Loop through PyDataset: 0.424622 ms
>>>
You're allocating an extra giant list in the pure dataset sample. I'd put down money that makes up for the majority of the difference. Try with xrange.
EDIT: Fixed the formating on the script because I posted it without formatting like a noob.
xrange Script
from java.lang.System import nanoTime
colsInData = 100
rowsInData = 10000
headers = ['col_{}'.format(i) for i in range(colsInData)]
data = [[i for i in range(colsInData)] for j in range(rowsInData)]
testData = system.dataset.toDataSet(headers,data)
def ds_to_dicts(ds):
cols = ds.columnNames
return [dict(zip(cols,[ds.getValueAt(rI,col) for col in cols])) for rI in xrange(ds.rowCount)]
def pyDs_to_dicts(ds):
cols = ds.columnNames
rows = system.dataset.toPyDataSet(ds)
return [dict(zip(cols,row)) for row in rows]
dsTimes = []
pyDsTimes = []
iterations = 100
for iteration in range(iterations):
startTime = nanoTime()
result = ds_to_dicts(testData)
endTime = nanoTime()
dsTimes.append(float(endTime - startTime) / 1000000)
startTime = nanoTime()
result = pyDs_to_dicts(testData)
endTime = nanoTime()
pyDsTimes.append(float(endTime-startTime) / 1000000)
print 'Total Iterations:',iterations, testData
print 'Average Time to Loop through Dataset:', sum(dsTimes)/iterations, ' ms'
print 'Average Time to Loop through PyDataset:', sum(pyDsTimes)/iterations, ' ms'
Results:
>>>
Total Iterations: 100 Dataset [10000R ⅹ 100C]
Average Time to Loop through Dataset: 899.515183 ms
Average Time to Loop through PyDataset: 504.400945 ms
>>>
Well now I'm going to have to dig into it and find out why
I do have designs to (at least try, we'll see if it breaks everything) have datasets automatically get wrapped in the PyDataset class and avoid the need for manual conversion in 8.3.