How to handle <u> characters from scanner?

I’m receiving barcode scanned strings that look like <u>12345<u>. My question is about the <u>s?

Searching around on this subject it seems these are some sort of Unicode characters, except the descriptions I read seem to imply they are Unicode characters except they are different, a prefix and not on both ends of the string.

I only need the number chars 12345. The <u>s seem easy enough to remove with a line of scripting like below but . .
val = val.replace('<u>', '')

Was wondering if there are other things to consider about these <u> character/strings that are appended on each end of the string I’m receiving from a barcode scanner?

If it’s literal <u>, that’s not Unicode (you can try print repr(yourString) to confirm there’s no other control characters) but just some HTML/XML like syntax (although malformed).

Depending on your scanner, there’s probably a way to turn off or otherwise manipulate the pre/postfix. If that’s not feasible, it’s probably fine to remove them; assuming it’s unlikely for <u> to show up in a legitimate barcode scan.

1 Like


So it’s safe to say these are not Unicode characters?

And also that, most likely, these <u>s are being appended by the scanner?

If the scanner is adding these character it might make sense because there is more than one type of barcode that are scanned here. Perhaps it was the predecessor’s way of knowing which barcode was scanned?

1 Like

I work with @rvaught, these are put in via parameters being fed into the report. Our predecessor @wking used them to differentiate between a user barcode and a order barcode. The position means that when breaking apart the string, positions 0-3 will be the for barcodes and for work orders. This makes sorting the scans coming in much easier.


Yeah, Unicode characters will either show up in the string as (usually) obviously non-ASCII characters, or, if they’re escaped, something like \u0123, or \xAB:

1 Like