Get Data from Dataset "safely"

I have a dataset (or PyDataset).
I loop throw the data. Something like this:

for row in dataset:
    print row["COLUMN_1"]
    print row["COLUMN_2"]
    print row["COLUMN_3"]
    print row["COLUMN_4"]
    print row["COLUMN_5"]

However, any of the columns could not exist (I can't know it.. ). Is there any way of accesing the data safely so it doesn't throw an exception?
I believe there is not a safe way. I'm really looking for any scripting tip to do it. The idea is to print the row or print 0 if the column doesn't exist.

Thanks!

I don't think there is a straight forward way to do this. Maybe you can use
getColumnNames() function and maybe convert it into a set. Then, before printing you just do an

columnNameSet = set(dataset.getColumnNames())
for row in dataset:
	if "COLUMN_1" in columnNameSet:
		print row["COLUMN_1"]
	else:
		print 0
1 Like

Probably not great for perfomance (haven’t tested on a large ds) you could convert your PyDataset to a list of dicts like

data = [{column:row[column] for column in ds.getColumnNames()} for row in ds]

Then you can leverage the dictionary .get() method which allows you to provide a key and default return value if it does not exist.

for row in data:
    value = row.get("someColumn", None) # or whatever you want your default value to be if it's not found
    if value is not None:
        # value was found, do something

Maybe more pythonic to just try and ask forgiveness

for row in ds:
    try:
        x = row['someOtherColumn']
    except ValueError, e:
        # column did not exist
        print 0

there any way of accesing the data safely so it doesn't throw an exception?

You should get used to handling exceptions instead of just trying to perfectly ask for things imo. It’s the pythonic coding style. Much better to do.

try:
    doSomeTask()
except someError, e:
    # handle error

Than to do

if canDoTask():
    doSomeTask()
else:
    handleError()

May not seem like it if you’re not used to it but once you get adjusted it makes things very easy to reason about imo. More info https://stackoverflow.com/questions/12265451/ask-forgiveness-not-permission-explain

5 Likes

You can get a list of column names in a dataset with:

system.dataset.getColumnHeaders(datasetName)

With the list you can use the logic @Cose_Peter1 provided

1 Like

This is how I would do it:

	import traceback
	logger = system.util.getLogger("Column Count Test")
	try:
		#Test dataset saved in the root.custom props
		dataset = self.parent.custom.testDataset
		#Get the column count by looking at the length of the header list
		columnCount = len(system.dataset.getColumnHeaders(dataset))
		#Iterate as needed to accomplish the task
		for row in dataset:
			for col in range(columnCount):
				cellData = row[col]
				system.perspective.print(str(cellData))
	except:
		message = traceback.format_exc()
		#Print to the output console
		system.perspective.print(message)
		#Log any show stopping errors to the gateway   
		logger.error(message)

I also added in my personal go to error catching method using the traceback library.

If I could remember who showed me the traceback method I would give credit......but I don't remember.

1 Like

I'd add the missing columns to the dataset:

ds = system.dataset.toDataSet(
	['name', 'val', 'square'],
	[
		[
			"foo_{}".format(n),
			n,
			n*n
		] for n in range(10)
	]
)

columns_to_print = set(('name', 'val', 'something'))
actual_columns = set(ds.columnNames)
missing_columns = columns_to_print - actual_columns

for c in missing_columns:
	ds = system.dataset.addColumn(ds, [0] * ds.rowCount, c, int)


lib.dataset.print_ds(ds)

so, basically:

def normalize_ds(ds, columns):
	for c in set(columns) - set(ds.columnNames):
		ds = system.dataset.addColumn(ds, [0] * ds.rowCount, c, int)
	return ds

Which could (and probably should) be improved in a multitude of ways:
Make a "required columns" a dict, or list of dicts, so you can parameterize, for each of them, what its type and default value should be,
Allow users to chose if they want to keep all existing columns or only what's in the required columns,
validate the column types,
etc...

Assuming you want to do more than just printing the ds (in which case I'm not sure why you'd want to display a column full of zeros).
Also, what should happen to columns that are in the dataset, but not in your column list ?

But what's the use case exactly ?

1 Like