[IGN-4991, IGN-4996]system.file.writeFile encoding

systemparadox · September 28, 2018, 11:17am

Unlike system.file.readFileAsString, system.file.writeFile has no encoding option.

It appears that the behaviour is the same as readFileAsString without any encoding option - it uses Java’s default charset, which is platform-specific. On some systems it’s always UTF-8, but on others it can change depending on the system or user locale settings.

On my Linux machine it uses UTF-8, but on the customer’s Windows machine it uses latin1. Transferring files between these systems leads to invalid encoding errors in one direction and double-encoding in the other.

You can manually encode a unicode string in python with .encode('utf8'). However, the result is a str object, so system.file.writeFile still does the encoding. In Linux, this results in a double-utf8-encoded string:

system.file.writeFile(path, output.encode('utf8'))

The only way to get system.file.writeFile to not do the encoding is to give it an array of bytes rather than a string. Unfortunately, bytearray() is not available in Python 2.5. In 2.6 you could just do this:

system.file.writeFile(path, bytearray(output.encode('utf8')))

Otherwise you can use array:

from array import array
system.file.writeFile(path, array('b', output.encode('utf8')))

Or write the file directly with Python:

from __future__ import with_statement
import codecs
with codecs.open(path, 'w', 'utf8') as file:
    file.write(output)

But it would be really great if an encoding option could be added to system.file.writeFile.

PGriffith · September 28, 2018, 3:04pm

This is a solid idea - I’ve added an internal ticket to keep track of it.

systemparadox · September 28, 2018, 3:05pm

Excellent. Thanks very much.

justin.brzozoski · November 1, 2021, 4:02pm

Sorry to resurrect a zombie topic, but I’m pondering wonky behavior right around this right and am curious what the current state of system.file.writeFile is.

On Ignition 8.1.7 we now have a newer Jython and bytearray (yay!) but they don’t play well with system.file.writeFile leading to this weird dichotomy:

xls = system.dataset.toExcel(True, [dset])
with open('/tmp/t0.xls','w') as f:
 f.write(bytearray(xls))
system.file.writeFile('/tmp/t1.xls',xls)

This will work and output the same file via both methods, but the write function fails if I pass it the raw array.array output type from system.dataset.toExcel and system.file.writeFile will fail if I pass it the bytearray output. Seems like an unnecessary complication.

(And don’t do system.file.writeFile('t0.xls',str(bytearray(xls))) as that won’t complain but will create a corrupt file…)

EDIT: I would be able to work around this if I had a way to get from a Python bytearray back to something that system.file.writeFile accepted as binary. But I’m having a bit of trouble at the moment figuring it out. It didn’t like [int(x) for x in bytearray()]…

PGriffith · November 1, 2021, 4:20pm

I would avoid Jython’s types entirely. toExcel is returning a Java byte[], which Jython wraps in a type that is not bytearray. writeFile can directly handle that byte[] (and has an encoding argument now).

justin.brzozoski · November 1, 2021, 4:24pm

I am trying to do that as much as possible, but I have another parallel code path which generates ZIP files using the Python zipfile library and I am trying to handle the output of both the Excel and ZIP paths in common code.

I’m trying to find the trick to convert byte_result in the following sample back to something that system.file.writeFile will accept as binary:

import io
import zipfile
with io.BytesIO() as tmp_buff:
    with zipfile.ZipFile(tmp_buff, 'w', zipfile.ZIP_DEFLATED) as zfile:
        zfile.writestr('test.txt', 'Hello world!\n')
    tmp_buff.flush()
    byte_result = tmp_buff.getvalue()

Kevin.Herron · November 1, 2021, 4:28pm

Seems like we should just add support for bytearray to system.file.writeFile.

PGriffith · November 1, 2021, 4:43pm

I think your missing piece is the jarray module, a special Jython builtin for exactly this kind of thing:

import jarray

bytes = bytearray(10)
print bytes
print jarray.array(bytes, 'b')

See the Jython book for more background:
https://jython.readthedocs.io/en/latest/DataTypes/?highlight=jarray#jython-specific-collections

justin.brzozoski · November 1, 2021, 5:01pm

Still not quite there…

>>> whee = jarray.array(byte_result, 'b')
TypeError: Type not compatible with array type
>>> whee = jarray.array(bytearray(byte_result), 'b')
OverflowError: value too large for byte
>>> whee = jarray.array(list(byte_result), 'b')
TypeError: Type not compatible with array type

Looks like I’m not the only person stuck on this:
https://sourceforge.net/p/jython/mailman/message/35003179/

PGriffith · November 1, 2021, 5:14pm

It seems wrong to me that BytesIO.getvalue() returns a str, which would make me lean toward a Jython bug here:

PGriffith · November 1, 2021, 5:17pm

Interesting; that implicating it as a regression is odd. I'll file a separate ticket (I added a ticket for better bytearray support already) to investigate that Jython behavior.

Kevin.Herron · November 1, 2021, 5:19pm

BytesIO.getvalue() returns a bytes type, which is just an alias for str in 2.6/2.7

PGriffith · November 1, 2021, 5:27pm

Bleh, I always forget what a mess Python is in this area.
@justin.brzozoski, it may be worth looking at ditching zipfile for ‘native’ Java interaction with ZIP files. It’s more verbose, but mostly not too complicated.

justin.brzozoski · November 1, 2021, 5:44pm

Once I decided the built in ways were broken, I just built a bigger hammer:

def pythonByteString_to_jarray(pyString):
    return jarray.array(((x if x < 128 else (x - 256)) for x in bytearray(pyString)),'b')