TIP - Converting a dataset column into a list

kgamble · September 4, 2020, 4:07am

Something that I have done a ton, is query a list of values from a database, and then convert them into a list for some form of manipulation. Maybe I need to generate a list of users, equipment, or really anything.

I have always converted my datasets into lists the following way:

myList = []
for row in range(dataset.getRowCount()):
	myList.append(dataset.getValueAt(row, "myColumn"))

Well, today I feel dumb because I learned that you can do

myList = dataset.getColumnAsList("myColumn")

I figured I would share incase anyone has somehow missed this being a method on the dataset object like I have!

Enjoy

Matthew.gaitan · September 4, 2020, 6:31am

huh. I too did not know this existed! Thank you!

pturmel · September 4, 2020, 12:51pm

Beware! That is not defined on the Dataset Interface, but on the BasicDataset implementing class. There are implementations of Dataset, particularly in Gateway scope, that don’t have it.

bmeyers · September 4, 2020, 2:36pm

@pturmel

Can you elaborate further? Will this proposed method always work on datasets returned from a query?

kgamble · September 4, 2020, 2:39pm

The method is present on the BasicStreamingDataset class that is returned by a query binding in perspective.

I have noticed in query bindings it will not take a string parameter though, only a column index.

To agree with this statement, I know that it works on query bindings and datasets returned from system.db.runNamedQuery(), however there are surely other instances that cant use it (before 8.0).

PGriffith · September 4, 2020, 2:53pm

ds.getColumnAsList(ds.getColumnIndex("column")) should be safe on any dataset in 8.0+ - getColumnAsList accepting an integer is part of the Dataset interface. Note that if the column isn’t found you’ll get an index out of bounds exception.

pturmel · September 4, 2020, 2:57pm

Ah! Hadn't noticed that change.

kgamble · September 4, 2020, 3:11pm

Any reason why sometimes I have found this will take a column name and sometimes requires an index?

pturmel · September 4, 2020, 3:18pm

Blegh! Please use ds.getColumnIndex("column") if you care about speed.

PGriffith · September 4, 2020, 3:24pm

As @pturmel pointed out, it's not part of the Dataset interface, so implementing classes are not guaranteed to have it. Retrieving by column index is the only 'safe' way.

Sure, but we're using indexOf internally in AbstractDataset

kgamble · September 4, 2020, 3:26pm

For prosperities sake, the following quick test shows that the bottom three methods are all equally as fast, and MUCH faster than the iterative way that I used to use


def convertMethod1(ds):
	newRows = []
	for row in range(ds.getRowCount()):
		newRows.append(ds.getValueAt(row, "ID"))
	return newRows

def convertMethod2(ds):
	return ds.getColumnAsList(0)

def convertMethod3(ds):
	return ds.getColumnAsList(ds.columnNames.indexOf("ID"))

def convertMethod4(ds):
	return ds.getColumnAsList(ds.getColumnIndex("ID"))


def timeFunction(func, ds):
	time = system.date.toMillis(system.date.now())
	myRows = func(ds)
	end = system.date.toMillis(system.date.now())
	print "%s took %s milliseconds" % (str(func), end - time)
	


dsRows = []
for i in range(1000000):
	dsRows.append([i,"String %s" % i])

ds = system.dataset.toDataSet(['ID', 'StringVal'], dsRows)

timeFunction(convertMethod1, ds)
timeFunction(convertMethod2, ds)
timeFunction(convertMethod3, ds)
timeFunction(convertMethod4, ds)

With the result set:

<function convertMethod1 at 0x15> took 428 milliseconds
<function convertMethod2 at 0x16> took 9 milliseconds
<function convertMethod3 at 0x17> took 10 milliseconds
<function convertMethod4 at 0x18> took 10 milliseconds

pturmel · September 4, 2020, 3:39pm

But by referencing .columnNames you are calling .getColumnNames() to make a new list--not free--then searching it in a case-sensitive way. .getColumnIndex() is both faster and case insensitive. Of course, in any row-looping algorithm, the column index should be looked up just once, ahead of the loop, but that sort of optimization is lost on most people.

kgamble · September 4, 2020, 4:18pm

Question while you’re here, is it possible to call the automatic dataset to json conversion used in a query binding?

I know that there is system.dataset.toJSONObject() that returns it in pure json, but that isnt able to be de-serialized with system.util.jsonDecode() due to the way null's are stored

PGriffith · September 4, 2020, 4:22pm

There’s a public static method, but it would only be accessible inside Perspective scripts - it delegates because of special handling we have to use to send quality codes back and forth between Perspective’s frontend and backend. There’s com.inductiveautomation.ignition.common.TypeUtilities#datasetToJSON, but it’s got its own problems, and a very limited representation format.

kgamble · September 4, 2020, 4:24pm

Would it be reliable enough within Perspective to call it at the end of a script transform?

PGriffith · September 4, 2020, 4:27pm

Should be, it’s exactly the same as what we call at the end of queries.
com.inductiveautomation.perspective.gateway.binding.BindingUtils#datasetToJson

*With a caveat that this is an implementation detail, and a future refactoring of Ignition’s code could theoretically move this method, rename it, change the call signature, etc.

kgamble · September 4, 2020, 4:32pm

system.dataset.toJSON() sounds like a great idea for a supported function to me hahaha

MMaynard · September 4, 2020, 4:38pm

There is system.util.jsonEncode()
https://docs.inductiveautomation.com/display/DOC79/system.util.jsonEncode

We use it to convert datasets to json with the webDev module.

kgamble · September 4, 2020, 4:46pm

This works if I convert it to a pyDataset first, however the function aforementioned converts it into more of an object

So where jsonEncode will convert it into this format:

{
  "columns": [
    {
      "name": "Column1",
      "type": "java.lang.Double"
    },
    {
      "name": "Column2",
      "type": "java.lang.String"
    }
  ],
  "rows": [
    [
      1.0,
      5842.081
    ],
    [
      3.0,
      8305.222
    ],
    [
      5.0,
      11238.889
    ]
  ]
}

The aforementioned function built into a perspective transform would return it as an object formatted like this:

[
  {
    "Column1": 1.0,
    "Column2": 5842.081
  },
  {
    "Column1": 3.0,
    "Column2": 8305.222
  },
  {
    "Column1": 5.0,
    "Column2": 11238.889
  }
]

tmahaffey · February 25, 2021, 7:26pm

I’m wondering why this isn’t in the scripting functions if what I’m reading says that in version 8 there are not scope limitations for this fucntion. It’s a neat function and I have to comb through the forum to find it.