I want to extract data from a pdf file. I was able to get the data I wanted using PyPDF2 on a python 3.8 environment. Could I install this package in Ignition and use the same code in Ignition scripting? Is there a better approach to get what I want?
This is the code I used, for reference.
import PyPDF2 as pdflib
import re
pdf = pdflib.PdfReader('data.PDF')
txt = u'\n'.join(pg.extract_text() for pg in pdf.pages)
pattern = r'Dnia:.*?\n'
data = re.findall(pattern, txt)
# Regular expression to find dates and numbers
pattern_datesnum = re.compile(r'\d{2}\.\d{2}\.\d{4}|\+?\d[\d\.]*')
result = []
for line in data:
a = line.split()[1].replace('.', '/')
b = [string.replace('.', '') for string in line.split()[2:]]
c = a + " " + " ".join(b)
result.append(c)
new_list = []
for element in result:
# Remove numbers preceded by "Page:"
element_without_page = re.sub(r'Page:\s*\d+/\d+\s*', '', element)
# Keep only numbers and "-"
numbers_and_hyphens = re.findall(r'[-/\d]+', element_without_page)
# Join the numbers and hyphens found
new_list.append(' '.join(numbers_and_hyphens))