System.perspective.download Encoding

victor1 · September 10, 2020, 2:32pm

I’m trying to download a PDF, but when I open it everything is blank.
If I open it with notepad and compare it with the original PDF, that’s almost right. I think the error is because of the encoding.

Original:

Downloaded:

In the bottom right corner of the notepad is always UTF-8 when I download it, how can I change this?

My code:

    access_token = 'ya29.a0Af...'
	
	uri = 'https://www.googleapis.com/drive/v3/files/1-fS4Bgy7NCdM8Frsed1YNOTaR_8majxm?alt=media'
	headers = {'Authorization': 'Bearer ' + access_token}
	file = system.net.httpGet(url = uri, headerValues = headers)
	
	system.perspective.download('SPM AWS.pdf',file, "application/pdf")

PGriffith · September 10, 2020, 3:04pm

The problem is probably with httpGet, not system.perspective.download - once you’re downloading the file, there’s no encoding.

Since you’re already in 8.0, I would highly recommend migrating to system.net.httpClient.
Unlike httpGet, it will try to use the charset from the Content-Type header in the response - and if one is not found, it falls back to UTF-8, instead of the platform’s default charset (which is UTF-8 on basically every platform except Windows), as system.net.httpGet() does.

It would be a drop in replacement for the existing code you have:

uri = 'https://www.googleapis.com/drive/v3/files/1-fS4Bgy7NCdM8Frsed1YNOTaR_8majxm?alt=media'
headers = {'Authorization': 'Bearer ' + access_token}
file = system.net.httpClient().get(uri, headers=headers).body
# file = system.net.httpGet(url = uri, headerValues = headers)

victor1 · September 10, 2020, 3:21pm

Nice!!

Worked perfectly.
Thank you very much!

victor1 · September 11, 2020, 4:30pm

@PGriffith, I am facing a very similar problem right now, but when I upload a file, could you help me?

I’m using the ‘File upload’ component in perspective, trying to upload a file to Google Drive.
The file arrives in the Google Drive folder, but with the wrong encoding.

onFileReceived script event:

	import mimetypes
	
	access_token = self.getSibling("txt_accessToken").props.text	
	dataList = []
	boundary = 'wL36Yn8afVp8Ag7AmP8qZ0SA4n1v9T'
	
	#Metadata file
	dataList.append('--' + boundary)
	dataList.append('Content-Disposition: form-data; name=""; filename="{0}"'.format('/usr/local/bin/ignition/webserver/webapps/main/metadata.json'))	
	fileType = mimetypes.guess_type('/usr/local/bin/ignition/webserver/webapps/main/metadata.json')[0] or 'application/octet-stream'
	dataList.append('Content-Type: {}'.format(fileType))
	dataList.append('')
	metadata = system.file.readFileAsString('/usr/local/bin/ignition/webserver/webapps/main/metadata.json')
	dataList.append(metadata)	
	
	#PDF file
	dataList.append('--' + boundary)		
	dataList.append('Content-Disposition: form-data; name=""; filename="{0}"'.format(event.file.name))	
	fileType = mimetypes.guess_type(event.file.name)[0] or 'application/octet-stream'
	dataList.append('Content-Type: {}'.format(fileType))
	dataList.append('')
	dataList.append(event.file.getString())
		  
	dataList.append('--'+boundary+'--')
	dataList.append('')
	
	body = '\r\n'.join(dataList)
	payload = body
	
	self.getSibling("txt_requestBody").props.text = payload
	
	uri = 'https://www.googleapis.com/upload/drive/v3/files'
	headers = {'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'Bearer ' + access_token, 'Content-type': 'multipart/form-data; boundary={}'.format(boundary)}
	
	resposnse = system.net.httpClient().post(uri, data=payload, headers=headers)
	
	logger = system.util.getLogger("myLogger")
	logger.info(str(resposnse))

Original PDF file opened in Notepad:

Uploaded file:

PGriffith · September 11, 2020, 4:53pm

Is implicitly using the default charset. Add "UTF-8" to the getString() call.

pturmel · September 11, 2020, 5:04pm

Why are you not using .getBytes() if you are just saving the file to the filesystem? Converting from bytes to string back to bytes is not guaranteed to be idempotent.

victor1 · September 11, 2020, 5:08pm

I added the charset parameter, but I’m still getting the encoding error

dataList.append(event.file.getString("UTF-8"))

victor1 · September 11, 2020, 5:11pm

But I am not saving to the file system.
I’m trying to upload the file via HTTP to the Google Drive API

pturmel · September 11, 2020, 5:15pm

You should be constructing your upload with binary (probably base64) encoding of the bytes, not a string. PDFs have binary content and will not decode/reencode properly as strings.

PGriffith · September 11, 2020, 5:59pm

Ah, yeah, Phil’s right. You need to retrieve the bytes and encode them yourself into base64 to place into the multipart/form-data you’re building.

victor1 · September 11, 2020, 6:28pm

I made the following attempts:

dataList.append(base64.b64encode(event.file.getBytes()))

dataList.append(base64.b64decode(event.file.getBytes()))

dataList.append(base64.b64encode(event.file.getBytes()).encode('utf-8'))

dataList.append(base64.b64encode(event.file.getBytes()).decode('utf-8'))

dataList.append(base64.b64decode(event.file.getBytes()).encode('utf-8'))

dataList.append(base64.b64decode(event.file.getBytes()).decode('utf-8'))

but none of them worked …