Converting Dataset Tags to dictionaries to form a list of dicts

Joseph_Alcide · August 8, 2023, 1:25pm

I am currently trying to take stored dataset tags I have and form a new dictionary for each dataset (column names would be keys, and values would be values, only one row per dataset). When I print the type after reading these tags, it says its a java.util.ArrayList. When I try to convert it to a pyDataset using system.dataset.toPyDataset. I get the following error:
TypeError: toPyDataSet(): 1st arg can't be coerced to com.inductiveautomation.ignition.common.Dataset

Is there any way I can convert my datasets to pyDatasets/is there a way I can form these dictionaries using java scripting from this datatype (no experience in Java).

For example, dataset 1 is the following:

I just want to be able be able to convert it into the following dictionary { "Flow Rate": 0.0028, "Pressure": 0.0057, "Temperature": 0.0067.

Keep in mind that I have this dataset stored in a tag, only shown on table for visualization and cant do scripting based off dataset shown on table. Any advise would be extremely appreciated, thanks!

lrose · August 8, 2023, 1:32pm

Okay, but why not?

You say that the tag is a Dataset tag, but what is it really? How are you getting the value from the tag? If I look at a dataset tag that I have in the tag browser, this is what I see.

pturmel · August 8, 2023, 1:32pm

Where is this script running?

Joseph_Alcide · August 8, 2023, 1:35pm

This is what I can see from the tag. I have the table building as you progress through pressing buttons and there is another row added below it. If its possible, it would be helpful if I could take this dataset from the table with two rows and make each row a dictionary and then make a list of dicts from that. Not sure if that's possible though.

Joseph_Alcide · August 8, 2023, 1:37pm

Vision

pturmel · August 8, 2023, 1:38pm

It is trivial with my Simulation Aids module, with an expression binding like so:

forEach(
	{Coefficients/MLR},
	asMap(it())
)

This will work perfectly in Perspective. The problem you will have in Vision is that Vision has no custom property type that will hold the result. This will be true if you use a script to produce this, too.

What are you trying to do with this result?

lrose · August 8, 2023, 1:39pm

Can you show the script that you are using to get the value of the tag, that is producing the typeError?
Specifically how you are reading the tag, and the path that is used?.

This is possible, though I am still not convinced this information needs to be stored as a tag. It is a common misconception that new users of the platform make that everything needs to be a tag. It doesn't. A custom property is much more suited to this, if there is no reason for these datasets to be global to the system.

Joseph_Alcide · August 8, 2023, 1:41pm

Form a list of dicts to send over to a flask server.

Joseph_Alcide · August 8, 2023, 1:41pm

dataset1 = system.tag.readBlocking("[default]Coefficients/MLR")
dataset2 = system.tag.readBlocking("[default]CombinedData/MLR")

pydata1 = system.dataset.toPyDataSet(dataset1)
pydata2 = system.dataset.toPyDataSet(dataset2)

print type(dataset1)

Joseph_Alcide · August 8, 2023, 1:43pm

Needs to be, because the idea is to be able to generate multiple model types coefficients based on json data that is passed into it. Idea was to create a different tag per model type and hold to store.

lrose · August 8, 2023, 1:48pm

Okay,

system.tag.readBlocking() takes and returns a list . You should be getting your values like this:

tagPaths = ['[default]Coefficients/MLR','[defalut]CombinedData/MLR']

dataset1,dataset2 = [system.dataset.toPyDataSet(qv.value) for qv in system.tag.readBlocking(tagPaths)]

Then to convert a dataset to a list of dictionaries you would do something like:

toFlask = [{columnName:row[columnName] for row in dataset1} for columnName in dataset1.getColumnNames()]

Joseph_Alcide · August 8, 2023, 2:02pm

Thank you so much! Appreciate it!

pascal.fragnoud · August 8, 2023, 2:16pm

I have this in my script library:

def ds_to_dicts(ds):
	keys = ds.columnNames
	data = system.dataset.toPyDataSet(ds)
	return [dict(zip(keys, values)) for values in data]

edited to make it clearer

nminchin · August 9, 2023, 7:49am

Fwiw, i'd expect looping through a basic dataset to be faster than converting it to a py dataset first

pascal.fragnoud · August 9, 2023, 10:52am

I'd expect it not to matter much for most use cases.

Write for readability/simplicity first, then optimize if needed.
Though I guess if the operations are abstracted away, might as well use the most performant method... But as long as I haven't timed them, I won't even try to guess if there's a meaningful performance difference.
But since I'm not usually converting datasets in loops, I haven't felt the need to benchmark/profile it.

pturmel · August 9, 2023, 12:18pm

Worth testing. I suspect not. Pascal's helper is virtually identical to what I commonly use. (I use the names cols, rows, and row where he uses keys, data, and values.) When I'm not using my SmartMap dictionary subclass to make processing within a loop neater.

lrose · August 9, 2023, 1:23pm

Looks like looping through a PyDataset is about twice as fast. For smaller datasets there is still a "significant" difference but they are both fast enough that no user could tell the difference.

Test Script

from java.lang.System import nanoTime
colsInData = 100
rowsInData = 10000

headers = ['col_{}'.format(i) for i in range(colsInData)]
data = [[i for i in range(colsInData)] for j in range(rowsInData)]

testData = system.dataset.toDataSet(headers,data)


def ds_to_dicts(ds):
	cols = ds.columnNames
	return [dict(zip(cols,[ds.getValueAt(rI,col) for col in cols])) for rI in range(ds.rowCount)]

def pyDs_to_dicts(ds):
	cols = ds.columnNames
	rows = system.dataset.toPyDataSet(ds)
	return [dict(zip(cols,row)) for row in rows]

dsTimes = []
pyDsTimes = []

iterations = 100
for iteration in range(iterations):

	startTime = nanoTime()
	result = ds_to_dicts(testData)
	endTime = nanoTime()
	dsTimes.append(float(endTime - startTime) / 1000000)
	
	
	startTime = nanoTime()
	result = pyDs_to_dicts(testData)
	endTime = nanoTime()
	pyDsTimes.append(float(endTime-startTime) / 1000000)

print 'Total Iterations: ',iterations, testData
print 'Average Time to Loop through Dataset:', sum(dsTimes)/iterations, ' ms'
print 'Average Time to Loop through PyDataset:', sum(pyDsTimes)/iterations, ' ms'

Results:

>>>
Total Iterations: 100 Dataset [10000R x 100C]
Average Time to Loop through Dataset: 876.992074 ms
Average Time to Loop through PyDataset: 492.043886 ms
>>>
Total Iterations: 100 Dataset [10R x 100C]
Average Time to Loop through Dataset: 0.777953 ms
Average Time to Loop through PyDataset: 0.424622 ms
>>>

That's significant enough to convince me.

PGriffith · August 9, 2023, 1:42pm

You're allocating an extra giant list in the pure dataset sample. I'd put down money that makes up for the majority of the difference. Try with xrange.

lrose · August 9, 2023, 1:49pm

Good call, but nope.

EDIT: Fixed the formating on the script because I posted it without formatting like a noob.

xrange Script

from java.lang.System import nanoTime
colsInData = 100
rowsInData = 10000

headers = ['col_{}'.format(i) for i in range(colsInData)]
data = [[i for i in range(colsInData)] for j in range(rowsInData)]

testData = system.dataset.toDataSet(headers,data)


def ds_to_dicts(ds):
	cols = ds.columnNames
	return [dict(zip(cols,[ds.getValueAt(rI,col) for col in cols])) for rI in xrange(ds.rowCount)]

def pyDs_to_dicts(ds):
	cols = ds.columnNames
	rows = system.dataset.toPyDataSet(ds)
	return [dict(zip(cols,row)) for row in rows]

dsTimes = []
pyDsTimes = []

iterations = 100
for iteration in range(iterations):

	startTime = nanoTime()
	result = ds_to_dicts(testData)
	endTime = nanoTime()
	dsTimes.append(float(endTime - startTime) / 1000000)
	
	
	startTime = nanoTime()
	result = pyDs_to_dicts(testData)
	endTime = nanoTime()
	pyDsTimes.append(float(endTime-startTime) / 1000000)

print 'Total Iterations:',iterations, testData
print 'Average Time to Loop through Dataset:', sum(dsTimes)/iterations, ' ms'
print 'Average Time to Loop through PyDataset:', sum(pyDsTimes)/iterations, ' ms'

Results:

>>> 
Total Iterations: 100 Dataset [10000R ⅹ 100C]
Average Time to Loop through Dataset: 899.515183  ms
Average Time to Loop through PyDataset: 504.400945  ms
>>>

PGriffith · August 9, 2023, 1:51pm

Well now I'm going to have to dig into it and find out why

I do have designs to (at least try, we'll see if it breaks everything) have datasets automatically get wrapped in the PyDataset class and avoid the need for manual conversion in 8.3.