system.file.writeFile encoding

Unlike system.file.readFileAsString, system.file.writeFile has no encoding option.

It appears that the behaviour is the same as readFileAsString without any encoding option - it uses Java’s default charset, which is platform-specific. On some systems it’s always UTF-8, but on others it can change depending on the system or user locale settings.

On my Linux machine it uses UTF-8, but on the customer’s Windows machine it uses latin1. Transferring files between these systems leads to invalid encoding errors in one direction and double-encoding in the other.

You can manually encode a unicode string in python with .encode('utf8'). However, the result is a str object, so system.file.writeFile still does the encoding. In Linux, this results in a double-utf8-encoded string:

system.file.writeFile(path, output.encode('utf8'))

The only way to get system.file.writeFile to not do the encoding is to give it an array of bytes rather than a string. Unfortunately, bytearray() is not available in Python 2.5. In 2.6 you could just do this:

system.file.writeFile(path, bytearray(output.encode('utf8')))

Otherwise you can use array:

from array import array
system.file.writeFile(path, array('b', output.encode('utf8')))

Or write the file directly with Python:

from __future__ import with_statement
import codecs
with codecs.open(path, 'w', 'utf8') as file:
    file.write(output)

But it would be really great if an encoding option could be added to system.file.writeFile.

This is a solid idea - I’ve added an internal ticket to keep track of it.

Excellent. Thanks very much.