Reading and Processing CSVs - intermittent errors

Hey folks,

Apologies for the formatting.

I have a batch of CSV files that I would like to read in to a dataset, find the line of best fit equation, then use this equation to generate a new dataset at regular intervals.

I have a functioning script, however I’m encountering issues reading certain CSVs in the batch. I’m unsure why this is occurring, as the CSVs are all sourced in the same manner and appear to be formatted the same when I inspect them.

The error I am receiving is as follows:

Traceback (most recent call last):

File "<event:actionPerformed>", line 46, in CSV
ValueError: invalid literal for __float__: 1.#QNAN00

Ignition v7.9.10 (b2018112821)
Java: Azul Systems, Inc. 9.0.7.2

And for what it’s worth, here’s the script I am running - testing on buttons and with tables for the time being.

def CSV (esp = event.source.parent):
	import string
	import system
	import csv
	import shared
	
	DataOK = 1
	csvData =[]
	csvRow =[]
	
	# get CSV
	path = system.file.openFile('csv')
	if path != None:
	# open & read csv
		f=open(path,'rb')
		reader = csv.reader(f)
	#process file	
		rowNum = 0
		for row in reader:
	#First pass: Header row
			if rowNum == 0:
				header = row
	#Next Passess, Process rest of Data			
			else:
				if DataOK:
					csvData.append(row)
			rowNum +=1
	# Close CSV File
		f.close()
		
	# make a dataset and start manipulating it		
		ds = system.dataset.toDataSet(header, csvData)
		data = system.dataset.toPyDataSet(ds)

		#Set up column lists
		Volume = []
		Level = []
		Tank = 0

The following lines are where the error occurs

		for row in data:
			#Volume
			Volume.append(float(row["Volume"]))
			#Level
			Level.append(float(row["Raw Level"]))
			#Tank
			Tank = float(row["Tank Number"])	

		#Line of best fit function
		#Produced a list with the constants from the line 
		lobf = shared.math.polyfit(Level, Volume, 3, False)
		#Find max level
		maxLevel = int(max(Level))
		
		i=0
		dataOutput = []
		Headers = ["Tank", "Level", "Volume"]

		#Calculate new dataset using line of best fit
		#i^3(volume)+i^2(volume)+i(volume)+(y-int)
		for i in range(maxLevel):
		#i squared / cubed
			values = [i, i]
			i2 = system.math.product(values)
			values = [i2, i]
			i3 = system.math.product(values)	
		#3rd order
			values = [i3, lobf[3]]
			p3 = system.math.product(values)
		#2nd order
			values = [i2, lobf[2]]
			p2 = system.math.product(values)
		#first order
			values = [i, lobf[1]]
			p1 = system.math.product(values)
		#y intercept
			values = [lobf[0], p1, p2, p3]
			sum = system.math.sum(values)
			dataOutput.append([int(Tank), i, sum])
			i+=1

		# Push data to Table on this screen
		# assign display labels
		tankChart = system.dataset.toDataSet(Headers, dataOutput)
				
		def writeData(tankChart = tankChart, header = header, csvData = csvData, esp = esp):
			# data read in
			Table = esp.getComponent('Table')
			Table.data = system.dataset.toDataSet(header, csvData)	
			system.db.refresh(Table, "data")

			# data output from equation			
			Table1 = esp.getComponent('Table 1')
			Table1.data = tankChart	
			system.db.refresh(Table1, "data")			
		system.util.invokeLater(writeData)

try:
	system.util.invokeAsynchronous(CSV)
except:
	system.gui.errorBox( "An unexpected error importing the CSV", "Error!")

Finally, this is the CSV format.

RowNum,Date,Time,Tank Number,Level,Raw Level,Volume,Temp,Flow Rate,Chart Num
32068,02/05/2019,03:33:35,913.0000000,254.0000000,254.3789978,2042.1732178,47.4560165,0.0000000,61.0000000
32069,02/05/2019,03:33:35,913.0000000,254.0000000,254.3791046,2042.1732178,47.4560165,0.0000000,61.0000000

It sounds like you may have already checked this, but does the above show up somewhere in the CSV file it chokes on?

Not present anywhere. The file contents are indistinguishable from other CSVs. If I comment out the line in the code where it hangs, it hangs on the next one with the same error.

Is there any reason your code can’t perform the best-fit calculation on the rows of data while reading the CSV? You don’t do anything with the intermediate dataset except convert it to a PyDataset and iterate through it again. Those conversions enforce typing on the columns and could be introducing that NAN.

Also, your final try-catch will never catch an error in your background thread. Because it is running in a background thread. That try-catch can only capture an error starting that thread.

1 Like

Assuming you don’t have non-printing character in CSV (doesn’t seem likely in this case due to same source and intermittent nature), the NAN must be showing up in the conversion to dataSet or pyDataSet as Phil suggests. I like his suggestion to eliminate the extra conversions. If the error still occurs and is consistent on a certain file, you could narrow it down to the value causing the issue with some logging.

Thanks @pturmel , @witman

Parts of the code were sourced from other working components, thus the 2-step process.
I also hope to move the raw CSV data into a database for later use, but I can add that later.

This is where I wound up and I am no longer getting an error on the file. Need to test some more to confirm, but it appears that the extra conversions were indeed causing the issue.

def CSV (esp = event.source.parent):
	import string
	import system
	import csv
	import shared

	# get file
	path = system.file.openFile('csv')
	if path != None:
	# open & read csv
		f=open(path)
		reader = csv.DictReader(f)
	#process file	
		Volume = []
		Level = []
		Tank = 0
		rowNum = 0
		for row in reader:
			#First pass: Header row
			if rowNum != 0:
				#Volume
				Volume.append(float(row["Volume"]))
				#Level
				Level.append(float(row["Level"]))
				#Tank
				Tank = row["Tank Number"]
			rowNum+=1
	# Close CSV File
		f.close()
	# Perform line of best fit calculations
		lobf = shared.math.polyfit(Level, Volume, 3, False)
	#Determine height to calculate to
		maxLevel = max(Level)
	#Variables for calculations
		i=0
		dataOutput = []
		Headers = ["Tank", "Level", "Volume"]
		for i in range(maxLevel):
		#i2/3
			values = [i, i]
			i2 = system.math.product(values)
			values = [i2, i]
			i3 = system.math.product(values)
		#3rd order
			values = [i3, lobf[3]]
			p3 = system.math.product(values)
		#2nd order
			values = [i2, lobf[2]]
			p2 = system.math.product(values)
		#first order
			values = [i, lobf[1]]
			p1 = system.math.product(values)
		#y intercept = lobf[0]
		#build value for row
			values = [lobf[0], p1, p2, p3]
			sum = system.math.sum(values)
		#output to list
			dataOutput.append([Tank, i, sum])
			i+=1
			
		tankChart = system.dataset.toDataSet(Headers, dataOutput)
	# Push data to Table on this screen (i.e. review before putting into database)	
	# assign display labels
		def writeData(tankChart = tankChart, esp = esp):
			Table1 = esp.getComponent('Table 1')
			Table1.data = tankChart	
			system.db.refresh(Table1, "data")			
		system.util.invokeLater(writeData)