Parsing pdf file (not viewing) in Ignition

Fabrice_CHAVEROT · November 6, 2014, 12:57pm

Is it possible to parse pdf files with a python script in ignition with a 3rd party Library like pdfminer or
Jython + pdfbox ?

pdibenedetto · November 6, 2014, 2:33pm

Not trivial, but possible.

You could create a module with the Ignition SDK adding the pdfbox jars to it and then, call the functions from jython code.

pdibenedetto · November 6, 2014, 2:38pm

A quick view on PDF Miner (I used pdfbox, not PDF Miner) seems to be a pure python 2.4 library (not a wrapper from CPython, like anothers third party librarys that are not compatible with Ignition), so, if you have Ignition 7.7.x, you could add this as a third-party python library. More info in inductiveautomation.com/forum/vi … 12153&f=50

Regards,

Fabrice_CHAVEROT · November 6, 2014, 4:05pm

I have just tried exactly what you describe before with pdfminer
I have added pdfminer folder in pylib (…\Inductive Automation\Ignition\user-lib\pylib)
Ignition recognize the new python library (I have a message wich inform me Ignition has found a new library in pylib)

I have added without problem the following sample script on a button:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFTextExtractionNotAllowed
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfdevice import PDFDevice

Open a PDF file.

fp = open(‘C:\PyhtonScripts\HelloWorld.pdf’, ‘rb’)

Create a PDF parser object associated with the file object.

parser = PDFParser(fp)

Create a PDF document object that stores the document structure.

Supply the password for initialization.

document = PDFDocument(parser, password)

Check if the document allows text extraction. If not, abort.

if not document.is_extractable:
raise PDFTextExtractionNotAllowed

Create a PDF resource manager object that stores shared resources.

rsrcmgr = PDFResourceManager()

Create a PDF device object.

device = PDFDevice(rsrcmgr)

Create a PDF interpreter object.

interpreter = PDFPageInterpreter(rsrcmgr, device)

Process each page contained in the document.

for page in PDFPage.create_pages(document):
interpreter.process_page(page)

But unfortunately I have a Java run time error:

ERROR [ActionAdapter-MainThread] Error executing script for event: actionPerformed
on component: Button 3.
Traceback (most recent call last):
File “event:actionPerformed”, line 5, in
File “C:\Users\fabrice.chaverot.ignition\cache\gwlocalhost_8088_8043_main\C0\pylib\pdfminer\pdfinterp.py”, line 8, in
from cmapdb import CMapDB, CMap
File “C:\Users\fabrice.chaverot.ignition\cache\gwlocalhost_8088_8043_main\C0\pylib\pdfminer\cmapdb.py”, line 24, in
from encodingdb import name2unicode
File “C:\Users\fabrice.chaverot.ignition\cache\gwlocalhost_8088_8043_main\C0\pylib\pdfminer\encodingdb.py”, line 5, in
from glyphlist import glyphname2unicode
java.lang.ClassFormatError: Invalid method Code length 85551 in class file pdfminer/glyphlist$py

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(Unknown Source)

at org.python.core.BytecodeLoader$Loader.loadClassFromBytes(BytecodeLoader.java:119)

at org.python.core.BytecodeLoader.makeClass(BytecodeLoader.java:37)

at org.python.core.BytecodeLoader.makeCode(BytecodeLoader.java:67)

at org.python.core.imp.createFromSource(imp.java:353)

at org.python.core.imp.loadFromSource(imp.java:578)

…

pdibenedetto · November 6, 2014, 4:24pm

Mmmm… pretty strange… I’m not an expert but seems that java classes must not exceed a determinated size (according to jvm specifications, a class may not exceed 64KB) and maybe Ignition dynamically compile the python code (jython) to a java class and if the python code it’s too big, not work…

I can’t think a quickly workaround to this… maybe a more expert jython users or some guys from IA could help you…

You still have the pdfbox approach…

Regards,

Fabrice_CHAVEROT · November 6, 2014, 7:01pm

Yes I am going to look at the pdfbox approach, I’ve just downloaded the sdk. Thanks for your advices, Regards