Deleting pyDataset column (deleteRow) causes columAdd malfunction

diamond · July 15, 2024, 3:05pm

Hello,

I have this code, which does the following:

It copies one pydataset column into a list.
Deletes the copied pydataset column.
It creates a new column with the same data.

listData = []
for row in ds:
	listData.append(str(row["data"]))
	print len(listData), row["data"]
	
print "-----------------------------------------"
print "listData, len:", len(listData)
print "ds.getRowCount, len:", ds.getRowCount()
print "-----------------------------------------"
	
#Deleting column...
col_index = ds.getColumnIndex("data")	
ds = system.dataset.deleteRow(ds, col_index)
print "Column deleted!"	

#New column with same data...	
ds = system.dataset.addColumn(ds, listData, "newCol", str)
print "New column added!"

The output gives an error: IndexError: Number of values (21) doesn't match number of rows (20) in dataset, which is not correct.

If add the columns without removing the column, it works.

This doesn't make sense to me, is this a bug or I am missing something?

Thanks!

pturmel · July 15, 2024, 3:10pm

That deletes a row, not a column.

diamond · July 15, 2024, 3:18pm

Pfffff.... no comments

lrose · July 15, 2024, 3:18pm

system.dataset.deleteRow() deletes a ... row. Not a column, so yes actually, when you try to add the column in this script the number of rows in the dataset would be 20.

The function you would need to use to remove a column, is system.dataset.filterColumns()

dataIndex = ds.getColumnIndex("data")
listData = ds.getColumnAsList(dataIndex)
filteredHeaders = ds.getColumnNames.pop(dataIndex)

#remove column
ds = system.dataset.filterColumns(ds,filteredHeaders)
ds = system.dataset.addColumn(ds, listData, "newCol", str)