Making text files less than 3 MB?

bkarabinchak.psi · October 27, 2022, 4:51pm

I am working on a new requirement for a project and am wondering how I should approach it.

I currently export csv files end of the day to send via SFTP to some server. I just learned that these files cannot be larger than 3 MB and if they are, need to be turned into 3 MB files. I cannot ZIP the files unfortunately which would have been the easy answer.

The filenames can be slightly different regarding timestamps - right now its just one file that looks like Data2022102703000.txt. So if I needed to split the file and the second one had was like Data2022102703120.txt or similar that is fine.

I am just trying to figure out the best way to do this. These csv files are just the results of a SQL Query.

Method 1 would be just limiting each file to a certain number of lines that I know is less than 2 MB but I am not sure how many lines that would be and tbh seems kind of hacky. Or as im writing the csv file, every 50 rows checking the size and once I hit 2.9 MB its time for a new file.

Method 2: Make the full file and split it into multiple files after the fact with some methodology - what methodology, I am not sure, but I figure there must be one?

Anyone else have to work around requirements like this and how did you do it?

pturmel · October 27, 2022, 4:54pm

I would send hourly.

lrose · October 27, 2022, 4:54pm

Is it the total attachment size that is limited or the individual file size?

bkarabinchak.psi · October 27, 2022, 4:58pm

Individual file sizes. So 12 megs of info = 4 files.

bkarabinchak.psi · October 27, 2022, 4:59pm

Unfortunately I can't send the files mid work order, they only want files for completed work orders, so I have to wait for the whole work order to finish before the query even gets data, and at that point it can be a few megs.

lrose · October 27, 2022, 5:01pm

Could you split by completed work order per shift?

Although, honestly I think compression is the real solution here, unfortunate that you can't ZIP it. Do you have control of the other side of this process, or is the end user going to stitch the files back together?

bkarabinchak.psi · October 27, 2022, 5:04pm

They're stitching it back together lol. They have two other lines that I do this for as well, no size limit, no idea why this is now an issue tbh. Normally the work orders last days so even doing per shift - if the work order is not done, the query isn't giving me any data.

nminchin · October 28, 2022, 1:17pm

Can you use pythons getsizeof() function to get the size of the csv string and then write your multiple files using that to split them (if you know what I mean)?

MMaynard · October 28, 2022, 2:29pm

I would go with an arbitrary number of lines and leave it fixed at that.
So if lets say 2000 lines in the CSV puts it at 2.9 MB, I would create the file as a whole, then do a loop and split it into x number of files for each 2000 lines.
This way you can do the file naming and everything pretty consistently.

bkarabinchak.psi · October 28, 2022, 2:30pm

Yea this is actually what I decided on. Was going to do some testing to figure out how many lines gave me around 2.9 MB and just use that as a hard cut off. Appreciate all the input everyone.

justinedwards.jle · October 29, 2022, 1:00am

This is an interesting problem. Just in case it would help you out, I developed this Jython script for splitting files:

def fileSplitter(pathToFile, fileName, extension, outputSize):
	from java.io import File
	from java.util import Arrays 
	suffix = ".txt" 
	fullPath = pathToFile + fileName + extension
	bytesPerSplit = 1024 * 1024 * outputSize
	numSplits = int(File(fullPath).length() / bytesPerSplit )
	originalBytes = system.file.readFileAsBytes(fullPath)
	for i in range(numSplits + 1):
		if i * int(bytesPerSplit) < len(originalBytes):
			startingRange = i * int(bytesPerSplit)
		else:
			break
		if startingRange + int(bytesPerSplit) < len(originalBytes):
			endingRange = startingRange + int(bytesPerSplit)
		else:
			endingRange = len(originalBytes)
		subBytes = Arrays.copyOfRange(originalBytes, startingRange, endingRange)
		lastDigits = "-" + str(i).zfill(3)
		fullSubPath = pathToFile + fileName + lastDigits + suffix
		system.file.writeFile(fullSubPath, subBytes, False)
pathToFile = "D:\\myPath\\" 
fileName = "Data2022102703000"
extension = ".txt"
outputSize = 2.9
fileSplitter(pathToFile, fileName, extension, outputSize)

When the preceding code is ran using the file that is depicted below with the specified split size of 2.9 mb:

It produces the following output:

Edit: changed output naming convention to make it simpler and compatible with all file names.

bkarabinchak.psi · October 31, 2022, 12:29pm

That is pretty nice, if I hadn't already completed this on Friday I would definitely use your function. Only one problem that my particular system might run into was for a succificently large file this originalBytes = system.file.readFileAsBytes(fullPath) gave me java heap issues. I think that is probably just because I was testing on Designer/script console which had less memory allocated than the gateway but either way I did it with via opening the file with open(filePath) as f: and did it line by line, via a max line count per @MMaynard

Apparently if I the Gateway OS was linux I also could have used the split command and as a parameter fed in 3MB and it would have taken care of the rest for me, but alas I am in windows hell.

I appreciate all the help everyone.

victordcq · October 31, 2022, 1:00pm

bkarabinchak.psi · October 31, 2022, 1:04pm

Yea I tried split but it was an unrecognized command on the windows server 2008 os. Perhaps there was a way to get it to get it onto there but I have already scripted a solution.

victordcq · October 31, 2022, 1:06pm

bkarabinchak.psi · October 31, 2022, 1:08pm

Ignition 7.8 if you really want to grit your teeth!

pturmel · October 31, 2022, 2:36pm

Wrap a buffered reader around a FileInputStream. Reads lines very efficiently.