System.perspective.download Encoding

I’m trying to download a PDF, but when I open it everything is blank.
If I open it with notepad and compare it with the original PDF, that’s almost right. I think the error is because of the encoding.

Original:

Downloaded:

In the bottom right corner of the notepad is always UTF-8 when I download it, how can I change this?

My code:

    access_token = 'ya29.a0Af...'
	
	uri = 'https://www.googleapis.com/drive/v3/files/1-fS4Bgy7NCdM8Frsed1YNOTaR_8majxm?alt=media'
	headers = {'Authorization': 'Bearer ' + access_token}
	file = system.net.httpGet(url = uri, headerValues = headers)
	
	system.perspective.download('SPM AWS.pdf',file, "application/pdf")

The problem is probably with httpGet, not system.perspective.download - once you’re downloading the file, there’s no encoding.

Since you’re already in 8.0, I would highly recommend migrating to system.net.httpClient.
Unlike httpGet, it will try to use the charset from the Content-Type header in the response - and if one is not found, it falls back to UTF-8, instead of the platform’s default charset (which is UTF-8 on basically every platform except Windows), as system.net.httpGet() does.

It would be a drop in replacement for the existing code you have:

uri = 'https://www.googleapis.com/drive/v3/files/1-fS4Bgy7NCdM8Frsed1YNOTaR_8majxm?alt=media'
headers = {'Authorization': 'Bearer ' + access_token}
file = system.net.httpClient().get(uri, headers=headers).body
# file = system.net.httpGet(url = uri, headerValues = headers)
1 Like

Nice!!

Worked perfectly.
Thank you very much!

@PGriffith, I am facing a very similar problem right now, but when I upload a file, could you help me?

I’m using the ‘File upload’ component in perspective, trying to upload a file to Google Drive.
The file arrives in the Google Drive folder, but with the wrong encoding.

onFileReceived script event:

	import mimetypes
	
	access_token = self.getSibling("txt_accessToken").props.text	
	dataList = []
	boundary = 'wL36Yn8afVp8Ag7AmP8qZ0SA4n1v9T'
	
	#Metadata file
	dataList.append('--' + boundary)
	dataList.append('Content-Disposition: form-data; name=""; filename="{0}"'.format('/usr/local/bin/ignition/webserver/webapps/main/metadata.json'))	
	fileType = mimetypes.guess_type('/usr/local/bin/ignition/webserver/webapps/main/metadata.json')[0] or 'application/octet-stream'
	dataList.append('Content-Type: {}'.format(fileType))
	dataList.append('')
	metadata = system.file.readFileAsString('/usr/local/bin/ignition/webserver/webapps/main/metadata.json')
	dataList.append(metadata)	
	
	#PDF file
	dataList.append('--' + boundary)		
	dataList.append('Content-Disposition: form-data; name=""; filename="{0}"'.format(event.file.name))	
	fileType = mimetypes.guess_type(event.file.name)[0] or 'application/octet-stream'
	dataList.append('Content-Type: {}'.format(fileType))
	dataList.append('')
	dataList.append(event.file.getString())
		  
	dataList.append('--'+boundary+'--')
	dataList.append('')
	
	body = '\r\n'.join(dataList)
	payload = body
	
	self.getSibling("txt_requestBody").props.text = payload
	
	uri = 'https://www.googleapis.com/upload/drive/v3/files'
	headers = {'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'Bearer ' + access_token, 'Content-type': 'multipart/form-data; boundary={}'.format(boundary)}
	
	resposnse = system.net.httpClient().post(uri, data=payload, headers=headers)
	
	logger = system.util.getLogger("myLogger")
	logger.info(str(resposnse))

Original PDF file opened in Notepad:

Uploaded file:

Is implicitly using the default charset. Add "UTF-8" to the getString() call.

Why are you not using .getBytes() if you are just saving the file to the filesystem? Converting from bytes to string back to bytes is not guaranteed to be idempotent.

I added the charset parameter, but I’m still getting the encoding error

dataList.append(event.file.getString("UTF-8"))

But I am not saving to the file system.
I’m trying to upload the file via HTTP to the Google Drive API

You should be constructing your upload with binary (probably base64) encoding of the bytes, not a string. PDFs have binary content and will not decode/reencode properly as strings.

1 Like

Ah, yeah, Phil’s right. You need to retrieve the bytes and encode them yourself into base64 to place into the multipart/form-data you’re building.

I made the following attempts:

dataList.append(base64.b64encode(event.file.getBytes()))	
dataList.append(base64.b64decode(event.file.getBytes()))
dataList.append(base64.b64encode(event.file.getBytes()).encode('utf-8'))	
dataList.append(base64.b64encode(event.file.getBytes()).decode('utf-8'))
dataList.append(base64.b64decode(event.file.getBytes()).encode('utf-8'))
dataList.append(base64.b64decode(event.file.getBytes()).decode('utf-8'))

but none of them worked …