system.file.writeFile has no encoding option.
It appears that the behaviour is the same as
readFileAsString without any encoding option - it uses Java’s default charset, which is platform-specific. On some systems it’s always UTF-8, but on others it can change depending on the system or user locale settings.
On my Linux machine it uses UTF-8, but on the customer’s Windows machine it uses latin1. Transferring files between these systems leads to invalid encoding errors in one direction and double-encoding in the other.
You can manually encode a unicode string in python with
.encode('utf8'). However, the result is a
str object, so
system.file.writeFile still does the encoding. In Linux, this results in a double-utf8-encoded string:
The only way to get
system.file.writeFile to not do the encoding is to give it an array of bytes rather than a string. Unfortunately, bytearray() is not available in Python 2.5. In 2.6 you could just do this:
Otherwise you can use
from array import array system.file.writeFile(path, array('b', output.encode('utf8')))
Or write the file directly with Python:
from __future__ import with_statement import codecs with codecs.open(path, 'w', 'utf8') as file: file.write(output)
But it would be really great if an encoding option could be added to