TIP - Converting a dataset column into a list

Something that I have done a ton, is query a list of values from a database, and then convert them into a list for some form of manipulation. Maybe I need to generate a list of users, equipment, or really anything.

I have always converted my datasets into lists the following way:

myList = []
for row in range(dataset.getRowCount()):
	myList.append(dataset.getValueAt(row, "myColumn"))

Well, today I feel dumb because I learned that you can do

myList = dataset.getColumnAsList("myColumn")

I figured I would share incase anyone has somehow missed this being a method on the dataset object like I have!

Enjoy

7 Likes

huh. I too did not know this existed! Thank you!

Beware! That is not defined on the Dataset Interface, but on the BasicDataset implementing class. There are implementations of Dataset, particularly in Gateway scope, that don’t have it.

1 Like

@pturmel

Can you elaborate further? Will this proposed method always work on datasets returned from a query?

The method is present on the BasicStreamingDataset class that is returned by a query binding in perspective.

I have noticed in query bindings it will not take a string parameter though, only a column index.

To agree with this statement, I know that it works on query bindings and datasets returned from system.db.runNamedQuery(), however there are surely other instances that cant use it (before 8.0).

ds.getColumnAsList(ds.getColumnIndex("column")) should be safe on any dataset in 8.0+ - getColumnAsList accepting an integer is part of the Dataset interface. Note that if the column isn’t found you’ll get an index out of bounds exception.

3 Likes

Ah! Hadn’t noticed that change.

Any reason why sometimes I have found this will take a column name and sometimes requires an index?

Blegh! Please use ds.getColumnIndex("column") if you care about speed.

As @pturmel pointed out, it’s not part of the Dataset interface, so implementing classes are not guaranteed to have it. Retrieving by column index is the only ‘safe’ way.

Sure, but we’re using indexOf internally in AbstractDataset :man_shrugging:

For prosperities sake, the following quick test shows that the bottom three methods are all equally as fast, and MUCH faster than the iterative way that I used to use


def convertMethod1(ds):
	newRows = []
	for row in range(ds.getRowCount()):
		newRows.append(ds.getValueAt(row, "ID"))
	return newRows

def convertMethod2(ds):
	return ds.getColumnAsList(0)

def convertMethod3(ds):
	return ds.getColumnAsList(ds.columnNames.indexOf("ID"))

def convertMethod4(ds):
	return ds.getColumnAsList(ds.getColumnIndex("ID"))


def timeFunction(func, ds):
	time = system.date.toMillis(system.date.now())
	myRows = func(ds)
	end = system.date.toMillis(system.date.now())
	print "%s took %s milliseconds" % (str(func), end - time)
	


dsRows = []
for i in range(1000000):
	dsRows.append([i,"String %s" % i])

ds = system.dataset.toDataSet(['ID', 'StringVal'], dsRows)

timeFunction(convertMethod1, ds)
timeFunction(convertMethod2, ds)
timeFunction(convertMethod3, ds)
timeFunction(convertMethod4, ds)

With the result set:

<function convertMethod1 at 0x15> took 428 milliseconds
<function convertMethod2 at 0x16> took 9 milliseconds
<function convertMethod3 at 0x17> took 10 milliseconds
<function convertMethod4 at 0x18> took 10 milliseconds
	
1 Like

But by referencing .columnNames you are calling .getColumnNames() to make a new list–not free–then searching it in a case-sensitive way. .getColumnIndex() is both faster and case insensitive. Of course, in any row-looping algorithm, the column index should be looked up just once, ahead of the loop, but that sort of optimization is lost on most people.

2 Likes

Question while you’re here, is it possible to call the automatic dataset to json conversion used in a query binding?

I know that there is system.dataset.toJSONObject() that returns it in pure json, but that isnt able to be de-serialized with system.util.jsonDecode() due to the way null's are stored

There’s a public static method, but it would only be accessible inside Perspective scripts - it delegates because of special handling we have to use to send quality codes back and forth between Perspective’s frontend and backend. There’s com.inductiveautomation.ignition.common.TypeUtilities#datasetToJSON, but it’s got its own problems, and a very limited representation format.

Would it be reliable enough within Perspective to call it at the end of a script transform?

Should be, it’s exactly the same as what we call at the end of queries.
com.inductiveautomation.perspective.gateway.binding.BindingUtils#datasetToJson

*With a caveat that this is an implementation detail, and a future refactoring of Ignition’s code could theoretically move this method, rename it, change the call signature, etc.

1 Like

system.dataset.toJSON() sounds like a great idea for a supported function to me :man_shrugging: hahaha

There is system.util.jsonEncode()
https://docs.inductiveautomation.com/display/DOC79/system.util.jsonEncode

We use it to convert datasets to json with the webDev module.

This works if I convert it to a pyDataset first, however the function aforementioned converts it into more of an object

So where jsonEncode will convert it into this format:

{
  "columns": [
    {
      "name": "Column1",
      "type": "java.lang.Double"
    },
    {
      "name": "Column2",
      "type": "java.lang.String"
    }
  ],
  "rows": [
    [
      1.0,
      5842.081
    ],
    [
      3.0,
      8305.222
    ],
    [
      5.0,
      11238.889
    ]
  ]
}

The aforementioned function built into a perspective transform would return it as an object formatted like this:

[
  {
    "Column1": 1.0,
    "Column2": 5842.081
  },
  {
    "Column1": 3.0,
    "Column2": 8305.222
  },
  {
    "Column1": 5.0,
    "Column2": 11238.889
  }
]
1 Like

I’m wondering why this isn’t in the scripting functions if what I’m reading says that in version 8 there are not scope limitations for this fucntion. It’s a neat function and I have to comb through the forum to find it.

1 Like