I am working on a new requirement for a project and am wondering how I should approach it.
I currently export csv files end of the day to send via SFTP to some server. I just learned that these files cannot be larger than 3 MB and if they are, need to be turned into 3 MB files. I cannot ZIP the files unfortunately which would have been the easy answer.
The filenames can be slightly different regarding timestamps - right now its just one file that looks like
Data2022102703000.txt. So if I needed to split the file and the second one had was like
Data2022102703120.txt or similar that is fine.
I am just trying to figure out the best way to do this. These csv files are just the results of a SQL Query.
Method 1 would be just limiting each file to a certain number of lines that I know is less than 2 MB but I am not sure how many lines that would be and tbh seems kind of hacky. Or as im writing the csv file, every 50 rows checking the size and once I hit 2.9 MB its time for a new file.
Method 2: Make the full file and split it into multiple files after the fact with some methodology - what methodology, I am not sure, but I figure there must be one?
Anyone else have to work around requirements like this and how did you do it?
Is it the total attachment size that is limited or the individual file size?
Individual file sizes. So 12 megs of info = 4 files.
Unfortunately I can't send the files mid work order, they only want files for completed work orders, so I have to wait for the whole work order to finish before the query even gets data, and at that point it can be a few megs.
Could you split by completed work order per shift?
Although, honestly I think compression is the real solution here, unfortunate that you can't ZIP it. Do you have control of the other side of this process, or is the end user going to stitch the files back together?
They're stitching it back together lol. They have two other lines that I do this for as well, no size limit, no idea why this is now an issue tbh. Normally the work orders last days so even doing per shift - if the work order is not done, the query isn't giving me any data.
Can you use pythons
getsizeof() function to get the size of the csv string and then write your multiple files using that to split them (if you know what I mean)?
I would go with an arbitrary number of lines and leave it fixed at that.
So if lets say 2000 lines in the CSV puts it at 2.9 MB, I would create the file as a whole, then do a loop and split it into x number of files for each 2000 lines.
This way you can do the file naming and everything pretty consistently.
Yea this is actually what I decided on. Was going to do some testing to figure out how many lines gave me around 2.9 MB and just use that as a hard cut off. Appreciate all the input everyone.
This is an interesting problem. Just in case it would help you out, I developed this Jython script for splitting files:
def fileSplitter(pathToFile, fileName, extension, outputSize):
from java.io import File
from java.util import Arrays
suffix = ".txt"
fullPath = pathToFile + fileName + extension
bytesPerSplit = 1024 * 1024 * outputSize
numSplits = int(File(fullPath).length() / bytesPerSplit )
originalBytes = system.file.readFileAsBytes(fullPath)
for i in range(numSplits + 1):
if i * int(bytesPerSplit) < len(originalBytes):
startingRange = i * int(bytesPerSplit)
if startingRange + int(bytesPerSplit) < len(originalBytes):
endingRange = startingRange + int(bytesPerSplit)
endingRange = len(originalBytes)
subBytes = Arrays.copyOfRange(originalBytes, startingRange, endingRange)
lastDigits = "-" + str(i).zfill(3)
fullSubPath = pathToFile + fileName + lastDigits + suffix
system.file.writeFile(fullSubPath, subBytes, False)
pathToFile = "D:\\myPath\\"
fileName = "Data2022102703000"
extension = ".txt"
outputSize = 2.9
fileSplitter(pathToFile, fileName, extension, outputSize)
When the preceding code is ran using the file that is depicted below with the specified split size of 2.9 mb:
It produces the following output:
Edit: changed output naming convention to make it simpler and compatible with all file names.
That is pretty nice, if I hadn't already completed this on Friday I would definitely use your function. Only one problem that my particular system might run into was for a succificently large file this
originalBytes = system.file.readFileAsBytes(fullPath) gave me java heap issues. I think that is probably just because I was testing on Designer/script console which had less memory allocated than the gateway but either way I did it with via opening the file
with open(filePath) as f: and did it line by line, via a max line count per @MMaynard
Apparently if I the Gateway OS was linux I also could have used the
split command and as a parameter fed in 3MB and it would have taken care of the rest for me, but alas I am in windows hell.
I appreciate all the help everyone.
Yea I tried split but it was an unrecognized command on the windows server 2008 os. Perhaps there was a way to get it to get it onto there but I have already scripted a solution.
Ignition 7.8 if you really want to grit your teeth!
Wrap a buffered reader around a FileInputStream. Reads lines very efficiently.