I have a dataset (or PyDataset).
I loop throw the data. Something like this:
for row in dataset:
print row["COLUMN_1"]
print row["COLUMN_2"]
print row["COLUMN_3"]
print row["COLUMN_4"]
print row["COLUMN_5"]
However, any of the columns could not exist (I can't know it.. ). Is there any way of accesing the data safely so it doesn't throw an exception?
I believe there is not a safe way. I'm really looking for any scripting tip to do it. The idea is to print the row or print 0 if the column doesn't exist.
Thanks!
I don't think there is a straight forward way to do this. Maybe you can use
getColumnNames() function and maybe convert it into a set. Then, before printing you just do an
columnNameSet = set(dataset.getColumnNames())
for row in dataset:
if "COLUMN_1" in columnNameSet:
print row["COLUMN_1"]
else:
print 0
1 Like
Probably not great for perfomance (haven’t tested on a large ds) you could convert your PyDataset to a list of dicts like
data = [{column:row[column] for column in ds.getColumnNames()} for row in ds]
Then you can leverage the dictionary .get() method which allows you to provide a key and default return value if it does not exist.
for row in data:
value = row.get("someColumn", None) # or whatever you want your default value to be if it's not found
if value is not None:
# value was found, do something
Maybe more pythonic to just try and ask forgiveness
for row in ds:
try:
x = row['someOtherColumn']
except ValueError, e:
# column did not exist
print 0
there any way of accesing the data safely so it doesn't throw an exception?
You should get used to handling exceptions instead of just trying to perfectly ask for things imo. It’s the pythonic coding style. Much better to do.
try:
doSomeTask()
except someError, e:
# handle error
Than to do
if canDoTask():
doSomeTask()
else:
handleError()
May not seem like it if you’re not used to it but once you get adjusted it makes things very easy to reason about imo. More info https://stackoverflow.com/questions/12265451/ask-forgiveness-not-permission-explain
5 Likes
You can get a list of column names in a dataset with:
system.dataset.getColumnHeaders(datasetName)
With the list you can use the logic @Cose_Peter1 provided
1 Like
This is how I would do it:
import traceback
logger = system.util.getLogger("Column Count Test")
try:
#Test dataset saved in the root.custom props
dataset = self.parent.custom.testDataset
#Get the column count by looking at the length of the header list
columnCount = len(system.dataset.getColumnHeaders(dataset))
#Iterate as needed to accomplish the task
for row in dataset:
for col in range(columnCount):
cellData = row[col]
system.perspective.print(str(cellData))
except:
message = traceback.format_exc()
#Print to the output console
system.perspective.print(message)
#Log any show stopping errors to the gateway
logger.error(message)
I also added in my personal go to error catching method using the traceback library.
If I could remember who showed me the traceback method I would give credit......but I don't remember.
1 Like
I'd add the missing columns to the dataset:
ds = system.dataset.toDataSet(
['name', 'val', 'square'],
[
[
"foo_{}".format(n),
n,
n*n
] for n in range(10)
]
)
columns_to_print = set(('name', 'val', 'something'))
actual_columns = set(ds.columnNames)
missing_columns = columns_to_print - actual_columns
for c in missing_columns:
ds = system.dataset.addColumn(ds, [0] * ds.rowCount, c, int)
lib.dataset.print_ds(ds)
so, basically:
def normalize_ds(ds, columns):
for c in set(columns) - set(ds.columnNames):
ds = system.dataset.addColumn(ds, [0] * ds.rowCount, c, int)
return ds
Which could (and probably should) be improved in a multitude of ways:
Make a "required columns" a dict, or list of dicts, so you can parameterize, for each of them, what its type and default value should be,
Allow users to chose if they want to keep all existing columns or only what's in the required columns,
validate the column types,
etc...
Assuming you want to do more than just printing the ds (in which case I'm not sure why you'd want to display a column full of zeros).
Also, what should happen to columns that are in the dataset, but not in your column list ?
But what's the use case exactly ?
1 Like