Help reading in large CSV

I have a CSV file with 5500 rows, When I run this script in the script console the output stops at row 1925… What would cause this ?

[code]import fileinput

inputFile=system.file.openFile(‘csv’)
values = ‘’
flag = 0
columns = ‘’
x = 0
if inputFile != None:
fileinput.close()
for lineIn in fileinput.input(inputFile):

  flagTest=lineIn[:-(len(lineIn)-5)]   #Extract first five characters of the line.
  if flag==1:                         
     columns='(' + lineIn + ')'       #column list for query
     flag=0                           #Reset flag
  if flag==2:
     values+="('" + lineIn.replace('"','').replace(",","','") + "'),"  #add set of values. Strip away all double quotes,
                                                                       #  add single quotes to all values,
                                                                       #  wrap in parentheses,
                                                                       #  and pop a comma on the end.
                                                                       #  The comma prepares for the next set of values.
  
  if flagTest =='#NAME':               #Set flag according to '#" line delimiter
     flag=1
  elif flagTest == '#ROWS':
     flag=2
  elif flagTest == '#TYPE':
     flag=0

#fileinput.close()
values=values[:-1] #Strip off last comma after all the values have been processed
query="INSERT INTO table " + columns + " VALUES "+ values
fileinput.close()
print query
#system.db.runUpdateQuery(query)[/code]

Have you checked what is peculiar to line 1925 in your source file?
Maybe an errant comma or blank row?

Here is a lines 1919 - 1930 of my CSV. I’m not seeing anything that stands out.

[code]#NAMES,
NBP,WH,Location,Dis,Cost,Vendor,Dep,GL,Machine,NDX
#TYPES,
str,str,str,str,F,str,str,F,str,I
#ROWS,5500,

209405,3R,M8,480 SILENCER,4.6,SSI,1332,712101,NSLlli ,1919
209406,3R,L9,6545 CONNECTOR,10.81,SSI,1332,712101, Skinner,1920
209407,3R,M8,13823-ASSY OIL TANK,341.91, SSI,1332,712101,NSL60 ,1921
209408,3R,RETAINING RING 13080024,1.43,7777777,7777777,7777777,1922
209411,3R,999999,INFEED ROLLER 71CD80421500,227.79,PAK-ULPACKAGING,1332,712169,Mond,1923
209412,3R,999999,INFD ROLLER SHAFT 71CD80421408,227.79,PAK-UL ACKAGING,1332,712169,Mond,1924
209413,3R,61801 2RS BEARING,8.18,7777777,7777777,7777777,1925
209414,3R,H4,CONTACTOR 952009,188.3,TECHNICAL INC,1332,712100,Brine Tank,1926
209415,3R,H4,OVERLOAD 952507,211.1,CEDNC,1332,712100,Brine Tank,1927
209416,3R,30015,AUTOMATIC SWITCH 2210118,160.3,TECHNICAL INC,1332,712100,Inj,1928
209417,3R,30015,CONTACT BLOCK 203134,22.6,TECHNICAL INC,1332,712100,Injs,1929
209418,3R,Q,FENCE ASSY 331248,3454.08,MA INC,1332,712101,400 SAW,1930[/code]

Just for grin’s I made a new csv with just lines 1919-1930 and it ran fine. :scratch:

What type of database are you inserting these into?
Most have a limitation set on how long the insert statement can be character wise from a client.

We are using MS SQL
I’m sure the limit is going to bite me but… I’m basically dumping right into the print command ,system.db.runUpdateQuery is not in play here so the insert limit shouldn’t stop the print.

So I guess even if I were to get this to run it probably would exceed the insert limit. Dose anyone know how I would split this up into smaller sections ?

I guess I’m going about this the wrong way. I’m new to this so it might be easier for me to explain my goal and hopefully you guys can tell me if I’m barking up the wrong tree and maybe point me in the right direction.

I want to have a button that will

  1. Export the SQL table
    2.Delete all rows in the table
    3.insert a CSV into the table

The limit on an insert statement using VALUES in MS SQL is 1000 rows per statement.
You could do this a few different ways, 2 that come to mind are

  1. Do a single record insert on each loop of your FOR LOOP
  2. Use BULK INSERT with the CSV

Thanks everyone for you help so far :thumb_left:

I think I figured out how to split up the queries(code below)… What would be the optimal size for the query ? I also added a sleep function to pause the code between queries how long should I set this delay for?

I know query size and time between queries would be based on network/server performance but if you could give me an conservative best guess that would not end with the IT guys duct taping me to the server room door I would appreciate it. :wink:

also if you could peek at my code and let me know if this might work.

[code]import fileinput
import time

inputFile=system.file.openFile(‘csv’)
values = ‘’
flag = 0
columns = ‘’
start = time.time() # used to calculate elapsed time
End = time.time() # used to calculate elapsed time
el = 0 # used to calculate elapsed time
x = 0 # size of query…
if inputFile != None:
fileinput.close()
for lineIn in fileinput.input(inputFile):
flagTest=lineIn[:-(len(lineIn)-5)] #Extract first five characters of the line.
if flag==1:
columns=’(’ + lineIn + ‘)’ #column list for query
flag=0 #Reset flag
if flag==2:
x = x + 1
values+="(’" + lineIn.replace(’"’,’’).replace(",","’,’") + “’),” #add set of values. Strip away all double quotes,
if x == 10: # size of query…
values=values[:-1]
print values
print ‘Build query here’
print ’ Run system.db.runUpdateQuery here’
x = 0
values = “”
time.sleep(.5)
# wrap in parentheses,
# and pop a comma on the end.
# The comma prepares for the next set of values.

  if flagTest =='#NAME':               #Set flag according to '#" line delimiter
     flag=1
  elif flagTest == '#ROWS':
     flag=2
  elif flagTest == '#TYPE':
     flag=0

fileinput.close()
#values=values[:-1] #Strip off last comma after all the values have been processed
query="INSERT INTO table " + columns + " VALUES "+ values
End = time.time() # used to calculate elapsed time
el = End - start # used to calculate elapsed time
print el # Print elapsed time
if x != 0 :
print ‘Build query here’
print ’ Run system.db.runUpdateQuery here’
print values
#print query
#system.db.runUpdateQuery(query)[/code]