Unlike system.file.readFileAsString
, system.file.writeFile
has no encoding option.
It appears that the behaviour is the same as readFileAsString
without any encoding option - it uses Java’s default charset, which is platform-specific. On some systems it’s always UTF-8, but on others it can change depending on the system or user locale settings.
On my Linux machine it uses UTF-8, but on the customer’s Windows machine it uses latin1. Transferring files between these systems leads to invalid encoding errors in one direction and double-encoding in the other.
You can manually encode a unicode string in python with .encode('utf8')
. However, the result is a str
object, so system.file.writeFile
still does the encoding. In Linux, this results in a double-utf8-encoded string:
system.file.writeFile(path, output.encode('utf8'))
The only way to get system.file.writeFile
to not do the encoding is to give it an array of bytes rather than a string. Unfortunately, bytearray() is not available in Python 2.5. In 2.6 you could just do this:
system.file.writeFile(path, bytearray(output.encode('utf8')))
Otherwise you can use array
:
from array import array
system.file.writeFile(path, array('b', output.encode('utf8')))
Or write the file directly with Python:
from __future__ import with_statement
import codecs
with codecs.open(path, 'w', 'utf8') as file:
file.write(output)
But it would be really great if an encoding option could be added to system.file.writeFile
.