[SOLVED] Splitting PDF (blob)

I have a user who generates a report in Excel and exporting as a PDF. I want the user to be able to upload the PDF to Ignition. One issue I am having is that the report is exceptionally long, and when displayed on a dashboard, the font is too small. The location they will upload this to will be on a slide show. I want to split this long pdf into two separate pdfs and display them on different slides. I am trying to figure out how I can accomplish this. Any suggestions would be appreciated.

I currently have PDFs being uploaded and saved using the blob server module. Automation Professionals' Blob Server Module . I then wrote some code with FileUpload onFileReceived that writes to a tag.

    blobID = 1
    tagID = "[default]Blob Info/Safety 1"
    
    # Grab the file name and data
    filename = event.file.name
    filedata = event.file.getBytes()

    query = "SELECT blob as Content, 'application/pdf' as ContentType from files where id = ?"
    args = [blobID]
    db = "PostgreSQLPersonal"
    temp = system.db.runPrepQuery(query, args, db).getRowCount()

    if temp == 0:
        query = "INSERT INTO files (id, blob) VALUES (?, ?)"
        args = [blobID,filedata]
        db = "PostgreSQLPersonal"
        system.db.runPrepUpdate(query, args, db)
    else:
        query = "UPDATE files SET blob = ? WHERE id = ?"
        args = [filedata,blobID]
        db = "PostgreSQLPersonal"
        system.db.runPrepUpdate(query, args, db)

    system.tag.writeBlocking([tagID], [""])
    sleep(1)
    system.tag.writeBlocking([tagID], ['http://xx.xx.xxx.xxx/system/blob/SafetySlides/SimpleBlob?id='+str(blobID)])

This works as expected. Is it possible to split filedata into two pdfs, or is there a different way I should approach it?

I use the community version of pdftk on the command line to perform such tasks. For your case, you probably want to use its "burst" mode, which yields single pages per file.

You will need to save the filedata to a temporary location, run pdftk on it in that folder, then examine the folder for the output files.

2 Likes

Can I trigger pdftk to run from the Ignition side? I know in Vision I could probably use system.util.execute. However, in perspective, I am not sure how I would do this. I could probably get it to work by writing a Python script that runs os.system and running it with Flask or Bottle, but I am sure there is a more elegant solution.

Sure. For Perspective, you would install it on the gateway. I strongly recommend not using system.util.execute. It is only fire-and-forget, no way to get result status or tell that it is done. Always use java's ProcessBuilder. Many examples here on this forum.

This sounds great! I did not know that ProcessBuilder was a promising approach for perspective. I would have to run this in the Gateway Event Script, correct? My current plan now is.
(Perspective)

  1. Read in file through FileUpload (readFileAsBytes)
  2. Write the blob information to a Memory String Tag (writeBlocking)
    (Gateway Events)
  3. Have a Tag Change Gateway Event that reads the new tag on the change
  4. Create a pdf at a specified location (writeFile)
  5. Run ProcessBuilder on pdf to generate two pdfs
  6. Read back in both pdfs (readFileAsBytes)
  7. Write both blobs from step 6 to SQL
  8. Refresh display by resetting urls

Anything you can think of that I should approach differently? I may ask a question later on regarding ProcessBuilder/pdftk as I have not used either.

Perspective scripts are already running on the gateway. There's no need to write the BLOB to a tag, and in fact it's a bad idea, because it's a huge hit on the internal DB for no reason.

I would directly write the file data to the location you want in the file upload receive event, then kick off an async thread (from the event) that runs a project library script to split the PDF file up (blocking) then invokes another project library script to run the insert.

1 Like

I fully agree. Thanks for your suggestions. This is now what I have.

def runAction(self, event):
	from time import sleep
		
	blobID = 10
	tagPath = "[default]Blob Info/Daily Report"
	pdfPath = r"directory.pdf"
	
	# Grab the file name and data
	filename = event.file.name
	filedata = event.file.getBytes()
	system.file.writeFile(pdfPath, filedata)
	#Sets the URL path to blob to empty. This will force it to refresh
	system.tag.writeBlocking([tagPath], [""])
	sleep(1)
	
	system.util.invokeAsynchronous(PDFModifier.spliter, [pdfPath])
	system.util.invokeAsynchronous(SQL.inserter, [blobID, filedata, tagPath])

Then in my SQL I have

def inserter (blobID, blobData, tagPath):
	query = "SELECT blob as Content, 'application/pdf' as ContentType from files where id = ?"
	args = [blobID]
	db = "PostgreSQLPersonal"
	temp = system.db.runPrepQuery(query, args, db).getRowCount()

	if temp == 0:
		query = "INSERT INTO files (id, blob) VALUES (?, ?)"
		args = [blobID,blobData]
		db = "PostgreSQLPersonal"
		system.db.runPrepUpdate(query, args, db)
	else:
		query = "UPDATE files SET blob = ? WHERE id = ?"
		args = [blobData,blobID]
		db = "PostgreSQLPersonal"
		system.db.runPrepUpdate(query, args, db)

	system.tag.writeBlocking([tagPath], ['http://xx.xx.xx.xx:8088/system/blob/ShopFloorSlides/SimpleBlob?id='+str(blobID)])

I still need to finish the PDF splitter code.

Unfortunately, I do not believe pdftk will work. My pdf is only one page. Burst appears to split apart a PDF with multiple pages into separate pdfs (nice to know this exists and will use it in future applications). My best bet after a quick google search is to write a Python script. I plan on writing it up tomorrow should be pretty short, sweet, and to the point. If I get it working, I'll post it here for future reference. Thanks, both of you so far for your help.

You can probably write that SQL as a single statement upsert with an on conflict update clause:

As for your core splitting problem, could you have the end user upload the Excel file directly and do the Excel -> PDF step on the server? Splitting a well structured Excel file is going to be "easier" than splitting a PDF. Or maybe cycle it through a PDF printer driver that renders each "page" in the mega PDF to fit onto a certain 'paper size', and outputs a paginated PDF.

1 Like

I like both ideas! I'll get back with you tomorrow on how both things go.

Posting this a little later than tomorrow, but I figured I should post it. I could not pull in the file directly from Excel due to the server I have Ignition on not having Excel installed, and every Python library I found didn't work with the Excel sheet I had as it had graphs I wanted to pull in. This is how I managed to pull in the PDF.

import sys, getopt
from PyPDF2 import PdfWriter, PdfReader

def cropPDF(x, y, w, h, readLocation, saveLocation, p):
    with open(readLocation, "rb") as in_f:
        input1 = PdfReader(in_f)
        output = PdfWriter()
        page = input1.pages[p]

        # Get the coordinates of the upper-left corner of the page
        page_x, page_y = page.cropbox.upper_left
        upperLeft = [page_x.as_numeric(), page_y.as_numeric()]
        
        # Calculate the new coordinates for cropping
        new_upperLeft  = (upperLeft[0] + x, upperLeft[1] - y)
        new_lowerRight = (new_upperLeft[0] + w, new_upperLeft[1] - h)              
        
        # Set the new crop box coordinates
        page.cropbox.upper_left  = new_upperLeft
        page.cropbox.lower_right = new_lowerRight
        
        # Add the modified page to the output PDF
        output.add_page(page)
        
        # Save the modified PDF to the specified location
        with open(saveLocation, "wb") as out_f:
            output.write(out_f)

def main(argv):
    i = ''
    o = ''
    x, y, w, h, p = 0, 0, 0, 0, 0
    opts, args = getopt.getopt(argv, "x:y:w:h:i:o:p:")
    
    # Parse command line arguments
    for opt, arg in opts:
        # Starting X position
        if opt in ['-x']:
            x = float(arg)
        # Starting Y position
        elif opt in ['-y']:
            y = float(arg)
        # Width
        elif opt in ['-w']:
            w = float(arg)
        # Height
        elif opt in ['-h']:
            h = float(arg)
        # Input file
        elif opt in ['-i']:
            i = arg
        # Output file
        elif opt in ['-o']:
            o = arg
        # Page Number
        elif opt in ['-p']:
            p = int(arg)
    
    # Call the cropPDF function with the provided arguments
    cropPDF(x, y, w, h, i, o, p)

if __name__ == "__main__":
   main(sys.argv[1:])

Barebones, but it works. Hopefully, this will help someone in the future.
On the Ignition side of things simply created a function like this

from java.lang import ProcessBuilder, String
from java.io import File
	
# Function to run a Python script using the specified file path and arguments
def runPython(pyFilePath, *args):
    # Construct the command to execute the Python script
    command = [r'C:\Python\Python311\python.exe', pyFilePath] + list(args)
    # Create a ProcessBuilder with the command
    pb = ProcessBuilder(command)  
    # Start the process
    process = pb.start()
    # Read the output from the process
    output = process.getInputStream().readAllBytes()
    # Return the output
    return output

Thanks for you help once again @pturmel and @PGriffith

2 Likes