Java Heap Space error

Jonathan · May 7, 2013, 3:58pm

I have a script that downloads a file from a ftp site, unzips the file, writes the unzipped contents to a temp file, I then read the temp file line by line and delimit the data to write to another temp csv file. I then call a stored procedure using the csv file.

my problem is the file has grown over the years and i am receiving a java heap space error. i have increased the project to 2GB and the unzipped file has grown to ~147MB.

I can watch the task manager and see the memory increase for java.exe and never gets deallocated as the script progresses.

Is there a file I am not closing or a way for me to deallocated memory as my script is down with files? or is there a better way for me to achieve what I am doing?

the script throws an error on this line csvAsBytes = system.file.readFileAsBytes(csvFile)

from ftplib import FTP
import system

###Get gz file through FTP
ftp = FTP()# connect to host, default port
ftp.connect('ftpurl',21)
ftp.login('username','password')

filename = system.file.getTempFile("gz") #create temp file

ftpFile = open(filename,'wb')
try:
	ftp.retrbinary('FILENAME.gz', ftpFile.write) #download file and write to tempfile
except Exception:
	def fileMissing():
		import system
		system.gui.warningBox("The History File is missing from the FTP site.")
	system.util.invokeLater(fileMissing)
	print "Error in downloading the remote file."
else:
	def successfulDownload():
		import system
		system.gui.messageBox("Successful download!")
	system.util.invokeLater(successfulDownload)
	print "Successful download!"

ftpFile.close()
ftp.close()

print "ftp closed"


###extract the contents of the gz file
import java
import org.python.core
import gzip
import java.lang.StringBuffer as SB

gzipFile = gzip.open(filename)            #open read-binary file f with name fn
contents = gzipFile.read()
gzipFile.close()

print "file unzipped"


###Put contents of gz file in temporary file
newFilename = system.file.getTempFile("txt")

print "created temp file for unzipped content"

system.file.writeFile(newFilename, contents)

contents = None

print "wrote unzipped content to temp file"


fn = newFilename
text_file = open(fn, "r")
lines = text_file.readlines()
text_file.close()

print "read lines from temp file"

csvList = []
for line in lines:
	line2 = line.strip().replace("\"","").split(",")
	if(line2[0] == "DETAIL"):
		toStr = ",".join(line2)
		csvList.append(toStr+"\n")

lines = None

print "put lines in list"

csvFile = system.file.getTempFile("txt")

print "created temp file for escaped csv file"

csvWriteFile = open(csvFile, 'w')

for csvLine in csvList:
	csvWriteFile.write(csvLine)

print "wrote csv data to temp file"

csvWriteFile.close()

print csvFile

csvAsBytes = system.file.readFileAsBytes(csvFile)

print "converted csv file to bytes"

system.file.writeFile("filepath",csvAsBytes)

print "put csv as bytes file on server"

call = system.db.createSProcCall(event.source.parent.databaseName+"StoredProcedure")
system.db.execSProcCall(call)

print "called stored procedure"

Kevin.Herron · May 7, 2013, 4:05pm

Instead of reading the file into memory just to turn around and write it back to another file, have you tried using some of Python’s built in functions to simply move or copy the file to the file server?

Step7 · May 8, 2013, 3:00pm

Jonathan,

I’ve done something similar in the past where I was analyzing very large files in zipped format that were far too large to extract into memory. What I ended up doing was using the python zip library and writing an unzip algorithm that extracted something like 16k bytes of data at a time, analyzing it, writing whatever I wanted to a file, and then extracting the next 16k. It was very fast, used little ram, and I was able to rip through endless files one after another.

I don’t have the project right in front of me, but I’ll dig around. But take a look at the zip libraries, and you’ll figure out something.