Utf-8 script error

Hi, I have a problem with a script. I always get the same error with the last defined variable, as shown below.

//error
Traceback (most recent call last):

File “event:actionPerformed”, line 25, in

File “C:\Users\adria.ignition\cache\gwlocalhost_8088_8043_main\C1\pylib\encodings\utf_8.py”, line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: ‘utf-8’ codec can’t decode bytes in position 10-12: invalid data

//code
import time
import java
from java.io import File
from java.lang import *

pdf=event.source.parent.getComponent(‘pdf’).text
archivo = open(pdf, “rb”)
data = archivo.read()

if len(event.source.parent.parent.getComponent(‘incidencia’).text) <1:
event.source.parent.getComponent(‘asterisco’).visible=1
event.source.parent.getComponent(‘errorRojo’).visible=1
else:
event.source.parent.getComponent(‘asterisco’).visible=0
event.source.parent.getComponent(‘errorRojo’).visible=0
incidencia=event.source.parent.parent.getComponent(‘incidencia’).text
t_error=event.source.parent.getComponent(‘error’).selectedStringValue
f_inicio=event.source.parent.getComponent(‘f_inicio’).formattedDate
observaciones=event.source.parent.getComponent(‘observaciones’).text
f_sistema=time.strftime("%d/%m/%y %H:%M:%S")
event.source.parent.getComponent(‘observaciones’).text=""
event.source.parent.getComponent(‘pdf’).text=""
event.source.parent.parent.getComponent(‘incidencia’).text=""
DNI=event.source.parent.parent.DNI
query=“INSERT INTO incidencias (tipo_error, tipo_incidencia, observaciones,f_inicio, f_sistema, dni_operario, solucionado, doc_pdf) VALUES (’%s’,’%s’,’%s’,’%s’,’%s’,’%s’,’%s’,’%s’)” %(t_error, incidencia, observaciones, f_inicio, f_sistema,DNI ,“No”,data)
results=system.db.runPrepUpdate(query,[],‘DB’)

archivo.close()

runPrepUpdate requires a list of arguments. Putting all of your data into the query string defeats the purpose of using runPrepUpdate.

Try this:

[code]
query = “INSERT INTO incidencias (tipo_error, tipo_incidencia, observaciones,f_inicio, f_sistema, dni_operario, solucionado, doc_pdf) VALUES (?, ?, ?, ?, ?, ?, ?, ?)”

results=system.db.runPrepUpdate(query,[t_error, incidencia, observaciones, f_inicio, f_sistema, DNI ,“No”, data],‘DB’)[/code]

Your data is now

A PDF file is a binary file (or at least has binary parts), where random bit patterns can occur.

UTF-8 on the other hand (and quite a few other text encodings) have invalid patterns. See en.wikipedia.org/wiki/UTF-8#Invalid_code_points . Bit sequences that cannot be used in text to keep it decodable and keep compatibility with other standards.

Thus treating a PDF file as text shouldn’t be done. Even when parametrised, I guess the DB schema still treat it as being text, and it won’t work as expected.

There are ways to convert binary data to clean text. Like a base64 encoding, which uses 64 safe characters to encode groups of 6 bits (2^6 = 64). Though these do tend to waist a bit of space (a safe character uses 8 bits of space in utf-8, while it’s only encoding 6 bits), it’s a good option when only plain-text communication or storage is available.

But I guess it would be a better option to not store raw PDF documents in a database at all. Store them in a structured way on the HDD, and only keep a reference to it in your database (like the file path).

[quote=“Sanderd17”]But I guess it would be a better option to not store raw PDF documents in a database at all. Store them in a structured way on the HDD, and only keep a reference to it in your database (like the file path).[/quote]Storing true binary data in modern databases is perfectly fine as long as you actually use a BLOB or BINARY column type. And properly use ‘?’ substitution to push the binary content to the database.

What Phil said. I’ve stored literally millions of images and PDFs in databases and had zero problems. But I always store them as a blob.