The error you are getting is because of non XML information at the top and bottom of the dataset. Here is an edited version of your test file without the extraneous stuff:
sample_CustomersOrders.xml (15.1 KB)
Another problem I see is that there seems to be two datasets in that file. One is for customers and the other is for orders. Messing around with this, I was able to parse the file into seperate datasets two different ways.
Here is a Python version of the script that follows what you have above:
Python XML Parser
from xml.etree import ElementTree
#filePath = #Get your file
# Parse the XML file
root = ElementTree.parse(filePath).getroot()
# Create a list for the datasets that will be derived from each node
datasets = []
for node in root:
# Use a set to keep track of all unique tags (to be used as headers)
tags = set()
# Find all tags in the node to use as headers
for subnode in node:
for child in subnode.iter():
tags.add(child.tag)
# Get the headers and sort them
headers = sorted(list(tags))
# Create a List for the rows
data = []
# Process each sub-node as a row, and add the row to the data list
for subnode in node:
entry = {}
for child in subnode.iter():
if '\n' not in (child.text or ''):
entry[child.tag] = child.text or ''
row = [entry.get(header, '') for header in headers]
data.append(row)
# Convert the headers and data to a dataset, and add them to the datasets list
datasets.append(system.dataset.toDataSet(headers, data))
Here is the result with the customer dataset on top, and the orders dataset on bottom:
Here is the other way I put together that works similarly:
Jython XML Parser
from javax.xml.parsers import DocumentBuilderFactory
#xmlFile = # Get your file
document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(xmlFile)
# Get the root element
root = document.documentElement
# Create a list for the datasets that will be derived from each node
datasets = []
# Iterate over each child node of the root element
for rootIndex in range(root.childNodes.length):
node = root.childNodes.item(rootIndex)
if node.nodeType == node.ELEMENT_NODE:
# Collect all unique tags (for headers) from the children of this node
headers = set()
nodeList = node.childNodes
for index in range(nodeList.length):
childNode = nodeList.item(index)
if childNode.nodeType == childNode.ELEMENT_NODE:
childNodeList = childNode.childNodes
for subIndex in range(childNodeList.length):
subChildNode = childNodeList.item(subIndex)
if subChildNode.nodeType == subChildNode.ELEMENT_NODE:
headers.add(subChildNode.nodeName)
headers = sorted(list(headers))
# Create a list to hold all rows of data
data = []
# Process each child node to produce a list of rows for each dataset
for index in range(nodeList.length):
childNode = nodeList.item(index)
if childNode.nodeType == childNode.ELEMENT_NODE:
entry = {}
childNodeList = childNode.childNodes
for subIndex in range(childNodeList.length):
subChildNode = childNodeList.item(subIndex)
if subChildNode.nodeType == subChildNode.ELEMENT_NODE:
entry[subChildNode.nodeName] = subChildNode.textContent
row = [entry.get(header, '') for header in headers]
data.append(row)
# Convert the headers and data to a dataset, and add them to the datasets list
datasets.append(system.dataset.toDataSet(headers, data))
Here is the result with the customer dataset on top and the orders dataset on bottom:
In this version, there are less columns because the FullAddress information is all combined into a single column in the customer dataset, and the ShipInfo fields are all combined into a single column in the orders dataset.