[FEATURE] Add ability to convert a dictionary object to a dataset

nminchin · February 17, 2021, 6:22am

Using toDataSet on an object property that effectively defines a table doesn’t work…
TBC when I have more time…

cmallonee · February 17, 2021, 7:14am

Why do you expect it would? The docs specify that you either need to supply something which is already a dataset, or two list arguments, where one argument is a list of strings representing the headers (column names) and the second is a list of lists which represent the rows of the dataset.

headers = ['one', 'two']
rows = [['r1c1', 'r1c2'], ['r2c1', 'r2c2']]
my_dataset = system.dataset.toDataSet(headers, rows)

I would not expect the following to work, and would expect a script error in the logs:

my_table_data = [{'one': 'r1c1', 'two': 'r1c2'}, {'one': 'r2c1', 'two': 'r2c2'}]
system.dataset.toDataSet(my_table_data)

nminchin · February 17, 2021, 7:18am

Hmm, maybe it should be a feature request instead, i’ll change it.

lrose · February 17, 2021, 12:13pm

A dictionary by definition can only have one value per unique key? I think it reasonable that this isn’t implemented, I think that rather your object isn’t defined properly.

For instance shouldn’t this work?

data = {‘headers’:[‘c1’,’c2’,’c3’],’data’:[[‘r1v1’,’r1v2’,’r1v3’],[‘r2v1’,’r2v2’,’r2v3’]]}
ds = system.dataset.toDataSet(data[‘headers’],data[‘data’])

nminchin · February 17, 2021, 12:16pm

The only reason i’m using an array of dictionaries in this case is due to a bug in passing datasets (I would only ever pass small ones) to popups. Although I do prefer json objects, however I need to use the dataset lookup function in an expression. I could use a script transform, but I always prefer the non-scripting method if I can help it for the sake of simplicity and performance.

Also, my object is more like Cody’s:
my_table_data = [{'one': 'r1c1', 'two': 'r1c2'}, {'one': 'r2c1', 'two': 'r2c2'}]

MMaynard · February 17, 2021, 12:48pm

def dictToDataset(dict):
	data = []	
	columns = dict.keys()
	for key,value in dict.iteritems():
		row = []
		for column in columns:
			row.append([column])
		data.append(row)
	return system.dataset.toDataSet(columns,data)

Is what I use for simple dictionaries to datasets.

lrose · February 17, 2021, 2:59pm

Perhaps I am missing something here, but wont this code give you a dataset where all of the rows have the same value (e.g. the column headers)?

tDict = {'one':'r1c1','two':'r1c2'}
data = []
columns = tDict.keys()
for key,value in tDict.iteritems():
    row = []
    for column in columns:
        row.append([column])
    data.append(row)

print data

I would expect the output to be [['r1c1','r2c2']] but what it actually outputs is [[['one'],['two']],[['one'],['two']]]

I guess I am not understanding what you are using for your dictionary?

Something like this should do the trick then:

def arrDictToDataset(inputData):
	data = []
	headers = []
	for row in tData:
		if not headers:
			headers = row.keys()
		values = []
		for k,v in row.iteritems():
			values.append(v)
		data.append(values)
	return system.dataset.toDataSet(headers,data)

Though, admittedly, even this isn't a full proof function, as dictionaries are unordered, so use with caution. I'm still working on a way to insure the values end up in the correct columns, perhaps a DatasetBuilder is better.

zwhite · January 5, 2023, 12:13am

At risk of sounding potentially combative, but the development around datasets in ignition is just an awful experience.

import pandas as pd
pd.DataFrame(response.json())
#mic drop

In two lines I have exactly what I would expect from an API response that returns key valued pairs.
All this other looping and nested crap is such a distraction from what we want to actually be doing. Which is delivering.

edit: for others seeing this later
I will add a solution where the data comes in the following form where types MAY BE MIXED tn denotes type

my_table_data = [{'one': 'r1c1t1', 'two': 'r1c2t3'}, {'one': 'r2c1t2', 'two': 'r2c2t3'}]

SOLUTION

dictResponse = system.util.jsonDecode(response)

#Assign column headers by grabbing the keys from from the first item
columns = []
for i, element in enumerate(dictResponse[0].keys()):
	columns.append(str(element))

## Assume first element has the types you want
types = []
for i,element in enumerate(dictResponse[0].values()):
	#print(i,element)
	types.append(type(element))

### Append the data and ensure types are consistent
data = []
for i,element in enumerate(dictResponse):
	coerced_data = []
	for value, t in zip(element.values(), types):
		#print(i,value,t)
		coerced_data.append(t(value))
	#print(i,element.values())
	
	data.append(coerced_data)
	#system.dataset.toDataSet(element.keys(), element.values())

dataset = system.dataset.toDataSet(columns, data)

Edit: It should be noted that in this code I have assumed that the response doesn't return you a list of either column headers or types and thus relies on the first row of data to declare these things.

PGriffith · January 9, 2023, 10:55pm

In my opinion, the better answer isn't "make it easier to make a dataset from a JSON object", it's "make it easier to use JSON objects".

I'm curious (as someone who develops but doesn't use Ignition day-to-day) what the common use cases are for this. You mention converting an API response into a dataset - but, again, why? Can you not use the API response more directly?

pturmel · January 9, 2023, 11:30pm

Datasets are computationally superior in java. They are column stores under the hood, with very efficient storage and access. Python's lists of dictionaries, which is basically the comparable json-ish object, are a computational nightmare. Unfortunately, the end user needs JSON to make the front-end relatively efficient.

I suggest you learn to love datasets. Work with data as datasets until the last moment before you have to send to the browser. Where possible, work with expression functions, not jython, where data on the back end stays in java.

I agree that consuming an API that could be a dataset should have good tools to make it so.

zwhite · January 10, 2023, 8:39pm

In my opinion, the better answer isn't "make it easier to make a dataset from a JSON object", it's "make it easier to use JSON objects".

From my end this seems like semantics, but that would be great.

I'm curious (as someone who develops but doesn't use Ignition day-to-day) what the common use cases are for this. You mention converting an API response into a dataset - but, again, why? Can you not use the API response more directly?

It's IoT devices that don't readily hook into OPC or other I/O frameworks.

The common use case that readily comes to mind. I have some vendor with n amount of IoT devices independently collecting samples on things.

First thing I usually have to do is get a dataset for the devices as that list can change as I get more devices or decommission others. Also meta data about said device can be updated as well, so I don't really want to have to declare what all that meta data is upfront (there are reasons for only getting some meta data). This in my mind is tabular data.

It's quite intuitive to think of this table updating on some semi-frequent basis. To do this I'll pull from the IoT vendors API via an ignition gateway. Off the top I have described a very flexible way of dealing handling devices coming and going, and giving a launch pad for sub-setting said devices for various types off data ingestion or decision making.

I actually can't think of an IoT API, or otherwise that isn't essentially parsing some JSON response into a more usable format, but maybe that's just what I've been exposed to.

Edit: There is potentially better ignition functionality that I am not aware of that handles this situation. I myself do not use ignition on a daily basis, but do rely on the gateway to act as an interface for data historian purposes.

zwhite · January 10, 2023, 8:51pm

Datasets are computationally superior in java.

The problem is in their development and communication not in their computational prowess. Developing around them causes problems in debugging and maintaining. The extra overhead is to help alleviate these types of problems. I can understand for storing timeseries data there are advantages to a column grouped data store.

Python's lists of dictionaries, which is basically the comparable json-ish object, are a computational nightmare.

How is this true? Isn't the computational complexity for a hash map at its worst is O(n) and at best O(1) and for sorting I believe it's O(nlog(n))?

pturmel · January 10, 2023, 8:57pm

Pointer bloat, particularly. Every row is a dictionary with its own hashmap. Datasets have one hashmap for column names. Python dictionaries hold python objects, which are heavy wrappers around even the most primitive type. Datasets hold efficient java arrays (by column) of java objects. Especially efficient if no nulls.

It would be less of an issue if datasets were transformed to column-order JSON.

{ It is also unbelievably annoying to lose ordering thanks to jython dictionaries and java HashMaps. }

zwhite · January 10, 2023, 9:22pm

Ah! Thank you for the explanation and I can understand those pieces, but it's not quite what I'm focused on.
.
My argument is that it doesn't need to be so explicit and that this type of handling should be brought in under the hood. This is what I am pointing to in my previous solution. The code is merely a translation to get to a different structure for further operation.