Hi!
I have a script that reads from an excel. In that excel there is a list of names and configuration to import as tags.
However, sometimes Excel writes nonprintable characters and those are not valid for tag names.
Is there any way to clean a string to only have valid tag names characters? The idea is to clean it so I only get the intendeed name.
Here my problem. THe WJ invisible character...
Thx in advance!
Spaces are allowed in tag names, even if some people severely dislike this.
However, in your example, and in a quick test I did (8.1.35), Ignition is ignoring the WJ anyway?
My problem is creating the tags.
[Error_Configuration("The name 'NonAvailableStatus' is not a valid tag name")]
Try translate
>>> import string
>>> s = "foo\r\nbar"
>>> s.translate(None, string.whitespace)
'foobar'
2 Likes
Or a small regexp works
import re
''.join(re.findall('\w+' ,'NonAvailable’)),
My Integration Toolkit has mungeColumnName() functions (expression and script) that will do this for you. Originally created to make dataset column names into jython-acceptable variable names for the view() expression function.
2 Likes
I think I'm going to do this:
''.join(
c for c in text
if c in set("0123456789_ '-:()") or unicodedata.category(c)[0] == 'L'
)
Note: I didn't check if is a bad implementation for high amount of data
Does work but for me but removes the accents (I'm spanish).
Thing like this Compresión turn into this Compresin.
Then, instead of \w, use [0-9A-Za-zÀ-ÿ].
It should cover your needs
''.join(re.findall("[0-9A-Za-zÀ-ÿ]+" ,'Compresión'))
Edit : forget that, it does not work for a tag name.
1 Like
Apache StringUtils
from org.apache.commons.lang3 import StringUtils
stringList = ['Compresión', 'Crème brûlée', 'über', 'garçon', 'Señor']
for s in stringList:
print StringUtils.stripAccents(unicode(s))
output:
Compresion
Creme brulee
uber
garcon
Senor
>>>
1 Like
Had some time to sit with this a bit more. If you're working with unicode, it's usually wise to import unicode_literals
StringUtils() seems like it can do everything you need
from __future__ import unicode_literals
from org.apache.commons.lang3 import StringUtils
def normalize(stringIn, replaceDict={'':''}):
return StringUtils.stripAccents(StringUtils.replaceEach(StringUtils.normalizeSpace(stringIn), replaceDict.keys(), replaceDict.values()))
# Dictionary of any odd values you want to filter for.
# e.g: WordJoin is '\u2060' and has to be used it is not considered whitespace.
replaceDict = {'\u2060' : ''}
stringList = [' Compres\u2060ión', 'Crème\r\nbrûlée', 'über', 'garçon', 'Señor']
for s in stringList:
repr(normalize(s, replaceDict))
output:
"u'Compresion'"
"u'Creme brulee'"
"u'uber'"
"u'garcon'"
"u'Senor'"
>>>