Python bug or python bug?

jose.granero · January 31, 2019, 1:05pm

I have the following code to create a new dataset from an existing one, and I can’t figure out why the resulting dataset has the same data for every row. The problem arises at the end, although the variable Newdata changes for every iteration, the variable Datos has all its data equal to those in the last Newdata row.

Any ideas?


dataset1=system.dataset.toPyDataSet(event.source.parent.dataset1)
tagpaths=[]
fechas=[]
Newdata=[]
Datos=[]
headers=[]

headers.append('ts')
for row in range(dataset1.rowCount):
	tagpath=dataset1.getValueAt(row,'pv')
	if tagpath not in tagpaths:
		tagpaths.append(tagpath)
		headers.append(tagpath)
	fecha=dataset1.getValueAt(row,'ts')
	if fecha not in fechas:
		fechas.append(fecha)
		
longitud=len(headers)
for j in range(longitud):
	Newdata.append(None)


for i in range(len(fechas)):

	Newdata[0]=fechas[i]
		
	for row in range(dataset1.rowCount):
		tagpath=dataset1.getValueAt(row,'pv')
		valor=dataset1.getValueAt(row,'value')
		ts=dataset1.getValueAt(row,'ts')
		
		if ts==fechas[i]:
			for k in range(1,longitud):
				if tagpath==headers[k]:
					Newdata[k]=valor
	Datos.append(Newdata)
	print Newdata	 

datos_ordenados=system.dataset.toDataSet(headers,Datos)

event.source.parent.dataset2=datos_ordenados

lrose · February 8, 2019, 4:14pm

It is because you have defined Newdata outside of the scope of the for loop where it is appended to the Datos list. This means that you are appending the same instance of Newdata to Datos multiple times, then each time through the loop the you update that instance, resulting in every row in your dataset having the same values.

One small adjustment will correct your issue, just move the declaration of Newdata inside of the for loop where it is uses, so that it is re-instantiated each pass through the loop.

dataset1=system.dataset.toPyDataSet(event.source.parent.dataset1)
tagpaths=[]
fechas=[]
#Newdata=[] Don't define it here.
Datos=[]
headers=[]

headers.append('ts')
for row in range(dataset1.rowCount):
	tagpath=dataset1.getValueAt(row,'pv')
	if tagpath not in tagpaths:
		tagpaths.append(tagpath)
		headers.append(tagpath)
	fecha=dataset1.getValueAt(row,'ts')
	if fecha not in fechas:
		fechas.append(fecha)
		
longitud=len(headers)
#for j in range(longitud):   This for loop will need moved so that you're working on the correct Newdata
#	Newdata.append(None)


for i in range(len(fechas)):
        #Here is where you should create the instance of new data
        Newdata = []
        for j in range(longitud):
             Newdata.append(None)

	Newdata[0]=fechas[i]
		
	for row in range(dataset1.rowCount):
		tagpath=dataset1.getValueAt(row,'pv')
		valor=dataset1.getValueAt(row,'value')
		ts=dataset1.getValueAt(row,'ts')
		
		if ts==fechas[i]:
			for k in range(1,longitud):
				if tagpath==headers[k]:
					Newdata[k]=valor
	Datos.append(Newdata)
	print Newdata	 

datos_ordenados=system.dataset.toDataSet(headers,Datos)

Note, you convert the dataset to a PyData set but you never use it in that way, so really this is a unneeded call, it’s a minor thing and I didn’t remove it, but it will remove some execution time from the script.

This is a common mistake, I’ve been burned on it a time or 400 myself.

jose.granero · February 8, 2019, 8:22pm

Hi Irose,

Thank you very much for your help, although I don’t want to initialize Newdata for every iteration. I’ll try it and let you know.

Best regards,

Sanderd17 · February 11, 2019, 7:20am

Why not?

The Datos.append(Newdata) step is very fast in Python, because it doesn't copy the data, but just adds a pointer (or reference) to the NewData object. So if you change the NewData object afterwards, you're changing the data in the Datos object too (it's at the same memory location).

If you need different data for each row, you need to make a new object each iteration, that way you get new memory addresses to store different info.

jose.granero · February 11, 2019, 9:20am

Hi Sanderd,

Thank you for your answer.

From a functional point of view I need Newdata to be initialized for every iteration with the values it had in the previous iteration, as I’ve unsuccessfully tried with the following code:

dataset1=event.source.parent.dataset1
tagpaths=[]
fechas=[]
Datos=[]
headers=[]

headers.append('ts')
for row in range(dataset1.rowCount):
	tagpath=dataset1.getValueAt(row,'pv')
	if tagpath not in tagpaths:
		tagpaths.append(tagpath)
		headers.append(tagpath)
	fecha=dataset1.getValueAt(row,'ts')
	if fecha not in fechas:
		fechas.append(fecha)
		
longitud=len(headers)

for i in range(len(fechas)):
	Newdata=[]
	if i==0:
		for j in range(longitud):	
			Newdata.append(None)
			
	else:
		Newdata=Datos[i-1]
		print Newdata
	
	Newdata[0]=fechas[i]
		
	for row in range(dataset1.rowCount):
		tagpath=dataset1.getValueAt(row,'pv')
		valor=dataset1.getValueAt(row,'value')
		ts=dataset1.getValueAt(row,'ts')
		
		if ts==fechas[i]:
			for k in range(1,longitud):
				if tagpath==headers[k]:
					Newdata[k]=valor
	Datos.append(Newdata)

lrose · February 11, 2019, 1:22pm

You were close.

Similar mistake though. Remember that you are not appending an object to the list, rather a pointer to the object.

The list stores the memory location where the object has been stored, which usually you don’t notice, because you treat it like a standard object and it reacts like you would expect it to. However, when you do something like what you’re doing here it falls apart because in reality it is a pointer object. You’re passing a reference to an object not the value of it.

In you’re code this line:

Newdata=Datos[i-1]

Takes the value of the first item in the Datos list which is a pointer to the Newdata object created on the last trip through the loop and sets Newdata equal to that.

What you want to do is get the values in that object and use them to initialize the next iteration. (Assuming I understand what you’re looking for)

Try the following:

dataset1=event.source.parent.dataset1
tagpaths=[]
fechas=[]
Datos=[]
headers=[]

headers.append('ts')
for row in range(dataset1.rowCount):
	tagpath=dataset1.getValueAt(row,'pv')
	if tagpath not in tagpaths:
		tagpaths.append(tagpath)
		headers.append(tagpath)
	fecha=dataset1.getValueAt(row,'ts')
	if fecha not in fechas:
		fechas.append(fecha)
		
longitud=len(headers)

for i in range(len(fechas)):
	Newdata=[]
	if i==0:
		for j in range(longitud):	
			Newdata.append(None)
			
	else:
		for l in Datos[i-1]:
                     Newdata.append(l)
	
	Newdata[0]=fechas[i]
		
	for row in range(dataset1.rowCount):
		tagpath=dataset1.getValueAt(row,'pv')
		valor=dataset1.getValueAt(row,'value')
		ts=dataset1.getValueAt(row,'ts')
		
		if ts==fechas[i]:
			for k in range(1,longitud):
				if tagpath==headers[k]:
					Newdata[k]=valor
	Datos.append(Newdata)

jose.granero · February 11, 2019, 3:04pm

Hi Irose,

It does exactly what I was looking for.

Thank both for help me understand the concept.

ananyagupta1214 · September 11, 2019, 7:32am

Hello, code which you are sharing is really helpful I used it and that is tough to implement code because I am new in java coding I just start learning from CETPA.