How to clean a string from invisible characters?

danielps1818 · March 18, 2026, 11:17am

Hi!

I have a script that reads from an excel. In that excel there is a list of names and configuration to import as tags.

However, sometimes Excel writes nonprintable characters and those are not valid for tag names.
Is there any way to clean a string to only have valid tag names characters? The idea is to clean it so I only get the intendeed name.

Here my problem. THe WJ invisible character...

Thx in advance!

Matrix_Engineering · March 18, 2026, 12:12pm

Spaces are allowed in tag names, even if some people severely dislike this.

However, in your example, and in a quick test I did (8.1.35), Ignition is ignoring the WJ anyway?

danielps1818 · March 18, 2026, 12:17pm

My problem is creating the tags.

[Error_Configuration("The name '⁠NonAvailableStatus' is not a valid tag name")]

Kevin.Herron · March 18, 2026, 12:22pm

Try translate

>>> import string
>>> s = "foo\r\nbar"
>>> s.translate(None, string.whitespace)
'foobar'

automatisation · March 18, 2026, 12:32pm

Or a small regexp works
import re
''.join(re.findall('\w+' ,'Non⁠Available’)),

pturmel · March 18, 2026, 12:51pm

My Integration Toolkit has mungeColumnName() functions (expression and script) that will do this for you. Originally created to make dataset column names into jython-acceptable variable names for the view() expression function.

danielps1818 · March 18, 2026, 1:39pm

I think I'm going to do this:

''.join(
	c for c in text
	if c in set("0123456789_ '-:()") or unicodedata.category(c)[0] == 'L'
)

Note: I didn't check if is a bad implementation for high amount of data

danielps1818 · March 18, 2026, 1:40pm

Does work but for me but removes the accents (I'm spanish).

Thing like this Compresión turn into this Compresin.

automatisation · March 18, 2026, 2:04pm

Then, instead of \w, use [0-9A-Za-zÀ-ÿ].
It should cover your needs

''.join(re.findall("[0-9A-Za-zÀ-ÿ]+" ,'Compresión'))

Edit : forget that, it does not work for a tag name.

JordanCClark · March 18, 2026, 3:53pm

Apache StringUtils

from org.apache.commons.lang3 import StringUtils

stringList = ['Compresión', 'Crème brûlée', 'über', 'garçon', 'Señor']

for s in stringList:
	print StringUtils.stripAccents(unicode(s))

output:

Compresion
Creme brulee
uber
garcon
Senor
>>>

JordanCClark · March 18, 2026, 7:13pm

Had some time to sit with this a bit more. If you're working with unicode, it's usually wise to import unicode_literals

StringUtils() seems like it can do everything you need

from __future__ import unicode_literals
from org.apache.commons.lang3 import StringUtils

def normalize(stringIn, replaceDict={'':''}):
	return StringUtils.stripAccents(StringUtils.replaceEach(StringUtils.normalizeSpace(stringIn), replaceDict.keys(), replaceDict.values()))

# Dictionary of any odd values you want to filter for.
# e.g: WordJoin is '\u2060' and has to be used it is not considered whitespace. 
replaceDict = {'\u2060' : ''}


stringList = ['  Compres\u2060ión', 'Crème\r\nbrûlée', 'über', 'garçon', 'Señor']

for s in stringList:
	repr(normalize(s, replaceDict))

output:

"u'Compresion'"
"u'Creme brulee'"
"u'uber'"
"u'garcon'"
"u'Senor'"
>>>