Utf-8 script error

adriangarridob · February 2, 2017, 11:22am

Hi, I have a problem with a script. I always get the same error with the last defined variable, as shown below.

//error
Traceback (most recent call last):

File “event:actionPerformed”, line 25, in

File “C:\Users\adria.ignition\cache\gwlocalhost_8088_8043_main\C1\pylib\encodings\utf_8.py”, line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: ‘utf-8’ codec can’t decode bytes in position 10-12: invalid data

//code
import time
import java
from java.io import File
from java.lang import *

pdf=event.source.parent.getComponent(‘pdf’).text
archivo = open(pdf, “rb”)
data = archivo.read()

if len(event.source.parent.parent.getComponent(‘incidencia’).text) <1:
event.source.parent.getComponent(‘asterisco’).visible=1
event.source.parent.getComponent(‘errorRojo’).visible=1
else:
event.source.parent.getComponent(‘asterisco’).visible=0
event.source.parent.getComponent(‘errorRojo’).visible=0
incidencia=event.source.parent.parent.getComponent(‘incidencia’).text
t_error=event.source.parent.getComponent(‘error’).selectedStringValue
f_inicio=event.source.parent.getComponent(‘f_inicio’).formattedDate
observaciones=event.source.parent.getComponent(‘observaciones’).text
f_sistema=time.strftime("%d/%m/%y %H:%M:%S")
event.source.parent.getComponent(‘observaciones’).text=""
event.source.parent.getComponent(‘pdf’).text=""
event.source.parent.parent.getComponent(‘incidencia’).text=""
DNI=event.source.parent.parent.DNI
query=“INSERT INTO incidencias (tipo_error, tipo_incidencia, observaciones,f_inicio, f_sistema, dni_operario, solucionado, doc_pdf) VALUES (’%s’,’%s’,’%s’,’%s’,’%s’,’%s’,’%s’,’%s’)” %(t_error, incidencia, observaciones, f_inicio, f_sistema,DNI ,“No”,data)
results=system.db.runPrepUpdate(query,[],‘DB’)

archivo.close()

JGJohnson · February 2, 2017, 1:48pm

runPrepUpdate requires a list of arguments. Putting all of your data into the query string defeats the purpose of using runPrepUpdate.

Try this:

[code]
query = “INSERT INTO incidencias (tipo_error, tipo_incidencia, observaciones,f_inicio, f_sistema, dni_operario, solucionado, doc_pdf) VALUES (?, ?, ?, ?, ?, ?, ?, ?)”

results=system.db.runPrepUpdate(query,[t_error, incidencia, observaciones, f_inicio, f_sistema, DNI ,“No”, data],‘DB’)[/code]

Your data is now

Sanderd17 · February 3, 2017, 1:27pm

A PDF file is a binary file (or at least has binary parts), where random bit patterns can occur.

UTF-8 on the other hand (and quite a few other text encodings) have invalid patterns. See en.wikipedia.org/wiki/UTF-8#Invalid_code_points . Bit sequences that cannot be used in text to keep it decodable and keep compatibility with other standards.

Thus treating a PDF file as text shouldn’t be done. Even when parametrised, I guess the DB schema still treat it as being text, and it won’t work as expected.

There are ways to convert binary data to clean text. Like a base64 encoding, which uses 64 safe characters to encode groups of 6 bits (2^6 = 64). Though these do tend to waist a bit of space (a safe character uses 8 bits of space in utf-8, while it’s only encoding 6 bits), it’s a good option when only plain-text communication or storage is available.

But I guess it would be a better option to not store raw PDF documents in a database at all. Store them in a structured way on the HDD, and only keep a reference to it in your database (like the file path).

pturmel · February 3, 2017, 2:31pm

[quote=“Sanderd17”]But I guess it would be a better option to not store raw PDF documents in a database at all. Store them in a structured way on the HDD, and only keep a reference to it in your database (like the file path).[/quote]Storing true binary data in modern databases is perfectly fine as long as you actually use a BLOB or BINARY column type. And properly use ‘?’ substitution to push the binary content to the database.

KathyApplebaum · February 3, 2017, 3:34pm

What Phil said. I’ve stored literally millions of images and PDFs in databases and had zero problems. But I always store them as a blob.