system.net.httpClient().request() is doing some uncommanded encoding on my binary data

mmunin · March 28, 2021, 10:02pm

I would like to have my Vision client user select a file, and then use system.net.httpClient() to POST the contents of that file via HTTP.

According to the documentation, the following should work.

h = system.net.httpClient() 
path = 'C:\pathtomyfile'
bytes= system.file.readFileAsBytes(path).tostring() # returns a byte string of the file data
print(len(bytes)) # sanity check to make sure that our byte string has been imported correctly into memory
h.request("http://example.com","POST",data=bytes)

When I try this with a 2x2 pixel PNG image file, the script outputs the correct number of bytes, which is 128.

However, my files are ending up at my back end system corrupted. This corruption applies to all files except ones that are valid ASCII.

I have HTTP Toolkit installed, and when I configure system.net.httpClient() to use it as a proxy, I can see from the payload hex dump that the HTTP request sent to my back end system was 152 bytes. For ASCII files, these numbers match.

Does anyone know why it might be doing this, and if there’s a work around?

Kevin.Herron · March 28, 2021, 10:05pm

You're doing it at this step. You can't go from arbitrary binary to UTF-8 back to binary and expected it to survive intact.

Try Base64 encoding your payload before sending it and Base64 decoding on the backend.

edit: or just don't call tostring() first, the data parameter to the POST can be a byte array.

mmunin · March 28, 2021, 11:43pm

According to the Python 2.7 documentation, tostring() will return a string of bytes, as though it were being written as a file.

Since posting this issue, I have done an additional test. When trying to send a UTF-8 encoded file that is NOT ASCII compatible, there is still corruption of the data after the call to tostring()

When the file has text été ecoded to UTF-8 as C3 A9 74 C3 A9, the call to len(bytes) returns 5 as expected. However, my HTTP Toolkit is intercepting a payload of c3 83 c2 a9 74 c3 83 c2 a9. The fact that len(bytes) showed 5 and not 9 leads me to believe that the tostring() method is not doing any undesired en/decoding.

According to this page, these specific results I see are consistent with the data being subject to a conversion from ISO-8859-1 to UTF-8. It seems as though passing the bytes as a string will subject them to some decoding/encoding regardless. The only byte strings I can put through there that come out unchanged are ones that are only chr(0) to chr(127)

The only reason I’m even using tostring() is because I did not observe correct behavior at all
when trying to pass the byte array directly (by removing .tostring() from line 3 above). When I do, the HTTP request has content-type: application/json; charset=utf-8 with json content [-61, -87, 116, -61, -87]

When I was first trying this, I thought that I was doing something wrong by passing the byte array directly, so I started looking for other solutions. It would seem the “right” way isn’t working for me. Any ideas?

Kevin.Herron · March 28, 2021, 11:49pm

Try passing the byte array as the data parameter and explicitly setting the content-type header to application/octet-stream. It sounds like the content type is not being automatically detected and set as it should be.

mmunin · March 28, 2021, 11:50pm

Unfortunately, trying this leaves the payload unchanged. (Except now HTTP Toolkit won’t let me view it as JSON) That is to say, the binary content of the HTTP request is still the same UTF-8 encoded JSON

Kevin.Herron · March 29, 2021, 12:00am

I’m not sure I understand what you are trying to do any more.

Are you trying to post JSON to an endpoint or post binary content (like the PNG file from your first post)?

mmunin · March 29, 2021, 12:07am

My apologies, I am attempting to post binary data. For debugging purposes, I have since switched to a much simpler file so I can inspect the hex data directly. In HTTP Toolkit, if a request has the JSON content-type (what I got without providing the octet-stream header), it gives you a little browser to navigate the JSON. The only difference setting the octet-stream header did was change the content-type in the request. the data was identitcal: 21 bytes for the 21 characters in [-61,-87,116,-61,-87].

I’m not sure if you have an HTTP debugger tool like HTTP Toolkit, but you could try the following equivalent code without any need for files to see if you can replicate it.

h = system.net.httpClient(proxy="http://localhost:8000") # HTTP Toolkit is running on port 8000 on the same host as this designer/client
import jarray
bin = [-61, -87, 116, -61, -87] # 5 bytes of binary data that encode "été" in UTF-8
bytes = jarray.array(bin, 'b') # create a Jython Byte[]
print bytes
h.request("http://example.domain","POST",data=bytes,headers={"content-type":"application/octet-stream"})

Kevin.Herron · March 29, 2021, 12:19am

Okay, sorry, the problem is that system.net.httpClient is utterly broken when it comes to trying to use binary data. I think we’ll have to fix it.

As a workaround you can do something like this:

from org.python.core import PyByteArray

h = system.net.httpClient(proxy="http://localhost:8000") # HTTP Toolkit is running on port 8000 on the same host as this designer/client
import jarray
bin = [-61, -87, 116, -61, -87] # 5 bytes of binary data that encode "été" in UTF-8
bytes = jarray.array(bin, 'b') # create a Jython Byte[]
data = PyByteArray(bytes)
h.request("http://example.domain","POST",data=data,headers={"content-type":"application/octet-stream"})

mmunin · March 29, 2021, 12:21am

Thank you so much! I now officially count you among my top 2 favorite Kevins!

But…

I should let you know, your workaround preserves the data but adds a bunch of leading and trailing 0’s to it, no clue why. If you had to guess, will this be fixed in 8.1.4 or 8.1.5?

Kevin.Herron · March 29, 2021, 12:56am

Certainly not 8.1.4, possibly 8.1.5. I opened an issue for it that should get some attention this week.

mmunin · September 22, 2021, 5:41pm

was this the fix?

paul-griffith · September 22, 2021, 6:13pm

No, actually, it’s this:
https://inductiveautomation.com/downloads/releasenotes/8.1.9#17176

Trent.Boudreaux · January 28, 2022, 5:09pm

I’m seeing this behavior when using python’s built-in open() function in Ignition 8.1.11

fileReader = open(filePath,"rb")
fileData = fileReader.read()
fileReader.close()
...
client = system.net.httpClient()
response = client.put(url=urlCurr, data=fileData, headers=headerVals)

However, I was able to work around the issue by switching to system.file.readFilesAsBytes()

pturmel · January 28, 2022, 5:17pm

When there are jython functions and java functions that do the same things, use the java functions. jython’s stdlib has many flaws due to its python2 heritage. Encoding problems are one of them.