DOCX - Apache POI

Been looking for a way to read/edit Word (docx) files and ended up reading through a few posts here:

Tried reading through a bit of the Apache POI documentation for working with Word and tried to import org.apache.poi.xwpf and received ImportError: No module named xwpf
I checked with import org.apache.poi.ss from the second forum post and that seems to work no problem. Can anyone steer me in the right direction to get started with reading docx files?

Jython doesn't permit importing java packages in their entirety. You must import the specific classes you wish to use, by name (no wildcards).

2 Likes

Thanks for that, helped a lot.

For anyone else getting started with looking at modifying a template docx word document with placeholder values in tables (not nested tables) here's some code to get started:

import java.io.FileOutputStream as FileOutputStream
import java.io.FileInputStream as FileInputStream
import org.apache.poi.xwpf.usermodel.XWPFDocument as XWPFDocument
import org.apache.poi.xwpf.usermodel.XWPFParagraph as XWPFParagraph
import org.apache.poi.xwpf.usermodel.XWPFRun as XWPFRun
import org.apache.poi.xwpf.usermodel.XWPFTable as XWPFTable
import org.apache.poi.xwpf.usermodel.XWPFTableRow as XWPFTableRow
import org.apache.poi.xwpf.usermodel.XWPFTableCell as XWPFTableCell

location = "C:/Temp/"
in_filename = "template"
out_filename = "new_doc"
placeholder = "${something}"
new_val = "new value"

fis = FileInputStream(location + in_filename + ".docx")
doc = XWPFDocument(fis)
for table in doc.getTables():
	for table_row in table.getRows():
		for table_cell in table_row.getTableCells():
			for paragraph in table_cell.getParagraphs():					
				for run in paragraph.getRuns():
					# find placeholder - also be where to extract text if just reading doc
					if run.getText(0) == placeholder:
						run.setText(run.getText(0).replace(placeholder, new_val), 0)

output = FileOutputStream(location + out_filename + ".docx")
doc.write(output)
output.close()
fis.close()
3 Likes

Those imports could be simplified to:

from java.io import FileOutputStream, FileInputStream
from org.apache.poi.xwpf.usermodel import XWPFDocument, XWPFParagraph, XWPFRun, XWPFTable, XWPFTableRow, XWPFTableCell
2 Likes