Cannot create PyString with non-byte value Error when creating a DataSet

MikeAllgood · February 25, 2022, 7:27pm

I see there are a number of posts about the “Cannot create PyString with non-byte value”.
Unfortunately I’m not seeing a solution for the error as it is getting generated in my script.
I’m using the system.dataset.toDataSet(headers, data) function to create a dataset that will be bound to a Perspective Table component.
The code that creates the data list is convoluted, but the values are either being hard coded or coming from DB queries. I logged the contents of the list and it is shown below with the headers assignment and the toDataSet instruction:

headers = ['lot', 'qty_ordered', 'qty_complete_last_op', 'qty_rejected', 'next_operation', 'status']
data = [
		['12345678-1', 25, 25, 0, u'Cycle', 'Idle'], 
		['12345678-2', 25, 25, 0, u'Cycle', 'Idle'], 
		['12345678-3', 9, 8, 1, 'Inspection (CURRENT)', 'ACTIVE'], 
		['12345678-4', 30, 30, 0, u'Cycle', 'Idle']
		]
tableDataset = system.dataset.toDataSet(headers, data)

When I take this data list and headers list to the Script Console and create the dataset, it works fine.

My understanding of the error is that a not Unicode character is in a string that is being inserted in the dataset (that is my sophomoric interpretation of the error). If this is correct, then I’m not seeing the offending character. Maybe it is a nonprintable character (since it isn’t Unicode maybe).

The version is 8.1.10.

How do I fix this (find the offending item)?

MikeAllgood · February 25, 2022, 8:07pm

A follow up.
I setup a function to that parses a string’s characters and converts them to an ASCII value using the ord() function. I then passed the portions of my data list that are strings with values coming from DB queries (not hard coded due to errors, etc.). It returns an “is ascii” status and also prints out the ascii value of each character. The largest ascii value is 121 (“y”), so they are all in the Unicode range (< 128).

What am I missing?

PGriffith · February 25, 2022, 8:08pm

Can you post a full stacktrace, if you’ve got one?

MikeAllgood · February 25, 2022, 8:44pm

Do you mean this:

com.inductiveautomation.ignition.common.script.JythonExecException: ValueError: Cannot create PyString with non-byte value
at org.python.core.Py.ValueError(Py.java:334)
at org.python.core.PyString.str_format(PyString.java:4042)
at org.python.core.PyString$str_format_exposer.__call__(Unknown Source)
at org.python.core.PyObject.__call__(PyObject.java:461)
at org.python.core.PyObject.__call__(PyObject.java:465)
at org.python.pycode._pyx785.buildLookupDataset$1(:185)
at org.python.pycode._pyx785.call_function()
at org.python.core.PyTableCode.call(PyTableCode.java:173)
at org.python.core.PyBaseCode.call(PyBaseCode.java:134)
at org.python.core.PyFunction.__call__(PyFunction.java:416)
at org.python.pycode._pyx786.onMessageReceived$1(:2)
at org.python.pycode._pyx786.call_function()
at org.python.core.PyTableCode.call(PyTableCode.java:173)
at org.python.core.PyBaseCode.call(PyBaseCode.java:306)
at org.python.core.PyFunction.function___call__(PyFunction.java:474)
at org.python.core.PyFunction.__call__(PyFunction.java:469)
at org.python.core.PyFunction.__call__(PyFunction.java:464)
at com.inductiveautomation.ignition.common.script.ScriptManager.runFunction(ScriptManager.java:849)
at com.inductiveautomation.ignition.common.script.ScriptManager.runFunction(ScriptManager.java:831)
at com.inductiveautomation.ignition.gateway.project.ProjectScriptLifecycle$TrackingProjectScriptManager.runFunction(ProjectScriptLifecycle.java:689)
at com.inductiveautomation.ignition.common.script.ScriptManager$ScriptFunctionImpl.invoke(ScriptManager.java:1000)
at com.inductiveautomation.ignition.gateway.project.ProjectScriptLifecycle$AutoRecompilingScriptFunction.invoke(ProjectScriptLifecycle.java:754)
at com.inductiveautomation.perspective.gateway.script.ScriptFunctionHelper.invoke(ScriptFunctionHelper.java:133)
at com.inductiveautomation.perspective.gateway.model.MessageHandlerCollection$MessageHandlerImpl$1.lambda$invoke$0(MessageHandlerCollection.java:81)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.python.core.PyException: ValueError: Cannot create PyString with non-byte value
... 28 common frames omitted

pturmel · February 25, 2022, 8:48pm

So, what's on line 185 of this class? I'm guessing a string formatting operation?

MikeAllgood · February 25, 2022, 8:52pm

self.custom.dsTableData = tableDataset

The script function is called by a message handler on the root of the view.
So the above line should be poking the newly created dataset into the custom parameter dsTableData.
I’ve added logging around all the commands at the end of the script function to trap where it’s breaking down. The last log message is just before the command

tableDataset = system.dataset.toDataSet(headers, data)

Here is a snippet of the ending portion of this script:

177			data.append([lot, qty_ordered, qty_complete_last_op, qty_rejected, nextOp, status])
178		msg = 'data: {}'.format(data)
179		util.log(ln, msg, session)
180	msg = 'Table Headers: {}'.format(headers)
181	util.log(ln, msg, session)
182	tableDataset = system.dataset.toDataSet(headers, data)
183	msg = 'Table Dataset: {}'.format(tableDataset)
184	util.log(ln, msg, session)
185	self.custom.dsTableData = tableDataset

The last logged message is from the line

msg = 'Table Headers: {}'.format(headers)

BTW, line 185 is the last line in the function.

pturmel · February 25, 2022, 8:57pm

I wonder if the line numbers in the compiled jython are off by one. What’s the next .format() operation after the assignment to dsTableData ?

Or is there a property change action on dsTableData ?

MikeAllgood · February 25, 2022, 9:02pm

No property change actions on the table or the custom parameter.
The message handler has one line of code to call the script file function:

Views.work_order_lookup.buildLookupDataset(self)

I’m at a loss. I’ve forced all strings to Unicode in my script with no change in the error.
Is it possible that one of the integer values in one or more of the lists in the data list could be causing this?
I’ve focused on the strings as that is how I interpreted the error message.

pturmel · February 25, 2022, 9:03pm

Hmm. Stumped.

MikeAllgood · February 25, 2022, 9:04pm

I broke it good if you’re stumped!

MikeAllgood · February 25, 2022, 9:28pm

I took the data that was being logged and created a hard coded list.
Put that list along with headers list into a new function and attempted to create and return the dataset with the new function.
In the new function, instead of trying to push the dataset into the self.custom.dsTableData parameter, I am simply returning it to the message hander and letting the message handler post to the parameter.
Actually I tired it both ways, but the following code uses the return method (just trying anything I can think of at this point).
Here is the test function:

def testDataset(self):
	session = self.session
	data = [
			[u'62563789-1', 	u'25', 	u'25', 	u'0', 	u'Debinding Cycle', 			u'Idle'], 
			[u'62563789-2', 	u'25', 	u'25', 	u'0', 	u'Debinding Cycle', 			u'Idle'], 
			[u'62563789-3', 	u'9',	u'8', 	u'1', 	u'RUV Inspection (CURRENT)', 	u'ACTIVE'], 
			[u'62563789-4', 	u'30', 	u'30', 	u'0', 	u'Debinding Cycle', 			u'Idle']
			]
	msg = 'data: {}'.format(data)
	util.log('testDataset', msg, session)
	headers = ['lot', 'qty_ordered', 'qty_complete_last_op', 'qty_rejected', 'next_operation', 'status']
	msg = 'headers: {}'.format(headers)
	util.log('testDataset', msg, session)
	tableDataset = system.dataset.toDataSet(headers, data)
	msg = 'Table Dataset: {}'.format(tableDataset)
	util.log('testDataset', msg, session)
#	self.custom.dsTableData = tableDataset
	return tableDataset

This generates the “PyString” exception too.

PGriffith · February 25, 2022, 9:56pm

Hah. It’s the format operation, indeed.

ds = system.dataset.toDataSet([], [])
print repr(str(ds))
print u"{}".format(ds)
print "{}".format(ds)

The default toString of a Dataset is something like this: Dataset [0R ⅹ 0C]. The x between rows and columns is not an ASCII x, but a roman numeral ten, for reasons lost to history.
Jython, in it’s implementation of format, doesn’t allow ‘widening’ - if you start with an ASCII literal (Table Dataset {}) it uses the same type for the output. So if you prefix your placeholder with u before the formatting operation, it’ll work.

Interestingly, CPython has a similar error, but in a different set of circumstances:

>>> "{}".format("Ⅹ")
'\xe2\x85\xa9'
>>> "{}".format(u"Ⅹ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2169' in position 0: ordinal not in range(128)
>>> u"{}".format(u"Ⅹ")
u'\u2169'
>>> u"{}".format("Ⅹ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

pascal.fragnoud · February 28, 2022, 2:41pm

¯\_(ツ)_/¯

Michael_Lindon · February 8, 2023, 1:23am

So I ran into this issue trying to use str() against a list or dictionary with a dataset in it, this caused the same error as above.

Wasn't sure how to add a "u" to the front of a dataset in a list without it being too much effort, so I and a teammate investigated and found there are two solutions:

convert the dataset to a PyDataset, which changes the output to a list of a list of values if thrown into a str call when in a list/dict.

image1486×574 81.7 KB

Edit: for clarity, the V_Util_QV is an overridden version of a Qualified Value. The asJson method loops through the qualified value converting the nested qualified values to Lists/Dictionaries as appropriate and converts Datasets to PyDatasets.

Overriding the Dataset class and replacing the toString() method, then its whatever format you want.

Hope this helps anyone with this issue.

Wishful thinking: Griffin would it be possible to change the Core Dataset Class toString method to remove this issue? For instance, would changing it to not be a roman numeral X fix it and no one would be the wiser?

PGriffith · February 8, 2023, 1:35am

I mean, yes, it's possible, but we're unlikely to change it because it's not actually broken (and, it turns out, it's good at surfacing issues like this with things that require ASCII when they really shouldn't). The core problem here is you're asking Jython for a str, but you have a value that can't fit into ASCII.

Do you get the same error if you use the unicode builtin, instead of str, around whatever you're trying to turn into a string?

Also, consider looking into the difference between repr and str, since PyDatasets properly implement both.

Michael_Lindon · February 8, 2023, 4:48pm

I checked repr, unicode, and str against a Dataset and PyDataset, it resulted in the same response for each with the expected minor formatting differences. I also tested PyDatasets with repr and str, the output is still inconsistent depending on whether it is in a list or on its own.

I agree a Dataset isn't ASCII therefore probably should never have been made stringable, however I presume you can't take away that functionality due to backwards compatibility. So, the request is that it be made consistent. I fully understand that there are considerations that need to be made for such a change and there is no way I could assess those from my position.

PGriffith · February 8, 2023, 5:54pm

Frankly, I think we're talking past each other.

You revived this thread about an exception being thrown. The cause of that exception being thrown is well known and understood. You're doing something nebulous in a project script that, apparently, relies on the string representation of datasets/pydatasets. I can't really put this in any other way: that's a bad idea. There's basically no circumstance where you should be programmatically relying on what a dataset looks like when naively converted to a string. Without seeing your actual script, I don't know what you're doing, but if it's ultimately serialization/deserialization, you absolutely must use a well-defined format like JSON.

As previously mentioned - there's not really any bug here.
From your last post, it sounds like you want str(list(someBasicDataset)) to return something other than [Dataset [0R ⅹ 0C]]. That's not happening now because when Python wraps a native Java object, it defines __str__() and __repr__() as forwarding to toString(), which is available on every Java object. Java doesn't have the distinction Python does. PyDataset however, does, so because __str__() on a list is defined as calling __repr__() on all elements, a list containing PyDatasets is output 'correctly', and consistent with it's __repr__():

ds = system.dataset.toDataSet([], [[]])
pds = system.dataset.toPyDataSet(ds)
l = [ds, pds]

print "str(ds) =", str(ds)
print "str(pds) =", str(pds)
print "repr(ds) =", repr(ds)
print "repr(pds) =", repr(pds)
print "str(l) =", str(l)
print "repr(l) =", repr(l)

>>> 
str(ds) = Dataset [0R ⅹ 0C]
str(pds) = <PyDataset rows:0 cols:0>
repr(ds) = Dataset [0R ⅹ 0C]
repr(pds) = []
str(l) = [Dataset [0R ⅹ 0C], []]
repr(l) = [Dataset [0R ⅹ 0C], []]

There's nothing inconsistent; Jython is doing the best it can with Dataset. If you want more specific behavior, convert to PyDataset. If you can show me an SSCCE of something actually inconsistent, then we have something to fix.

Michael_Lindon · February 9, 2023, 3:03pm

I dug deeper and found that Qualified Values when nested output Hashmaps, which don't handle datasets well, dictionaries and lists deal with them fine. I thought we were stripping out the Hashmaps, but after further review we weren't and replacing the Hashmap with a dictionary fixed the issues.

I apologize for the confusion.