def mapColumn(function, dataset, column):
if isinstance(column, basestring):
column = dataset.getColumnIndex(column)
columnData = map(function, dataset.getColumnAsList(column))
columnsToKeep = range(dataset.columnCount)
columnsToKeep.pop(column)
dsWithoutColumn = system.dataset.filterColumns(dataset, columnsToKeep)
return system.dataset.addColumn(
dsWithoutColumn,
column,
columnData,
dataset.getColumnName(column),
dataset.getColumnType(column)
)
It's relatively easy to create a 'mapColumn' function composed of our base primitives. This will be pretty efficient.
Usage is pretty simple - just declare a function that performs the calculation you want, for each value in the column, then supply it (no parenthesis!) to the mapColumn function:
# usage:
def updateColumn(value):
return value * 2
updatedDs = mapColumn(updateColumn, ds, "b")
On my test dataset, this does what you'd expect:
a | b | c
1 | 2 | 3
4 | 5 | 6
a | b | c
1 | 4 | 3
4 | 10 | 6