Tag Naming, Characters not allowed

francois.beduneau · December 7, 2021, 3:31pm

Hi There,

I’m facing some difficulties to create tag using non Latin characters.
Context: the user can create tags through a vision application by using the function system.tag.configure()
I would like to create a kind of filter to only take “authorized characters”.
I saw in the docs that ignition is very permissive about the tag name : Tag Naming
But I wonder what mean “valid unicode letter”, is it something like “unicode category” to avoid (Unicode category) or some “unicode block” (unicode block) not allowed, or something else…

Does anyone have more information or clarification on which characters are allowed/ not allowed?

Thanks in advance,
Francois.

victordcq · December 7, 2021, 3:39pm

Idk what is all possible but
You could use try except. And send popup message about the (name) error.
it atleast seems easier than trying to filter userinput yourself

francois.beduneau · December 7, 2021, 4:20pm

Sadly the function “system.tag.configure()” don’t throw any error…
The error come from the gateway and appear in the gateway log.
e.g:(Logger: Provider, Error_Configuration(“The name ‘------- Tag Name ----------’ is not a valid tag name”))

lrose · December 7, 2021, 4:22pm

You can’t have a - as the first character in a tag name. But perhaps that is just you redacting the tag name?

Kevin.Herron · December 7, 2021, 4:29pm

Found this in some test code... seems accurate?

Valid names start with a letter or underscore, and contain any of: letters, digits, underscore, space, parenthesis, single quote, dash, or colon.

francois.beduneau · December 8, 2021, 10:34am

Actually I need more precision...
When I said:

I’m facing some difficulties to create tag using non Latin characters.

I meant I have issues with non Latin alphabet.
exemple:
I currently have some issues with some characters from Thai alphabet: ชื้; ลี่;...
For example, this character "ลี่" is a concatenation of ล + ี + ่ (unicode: \u0e25\u0e35\u0e48) and this one is not allowed in the tag name. But if I take only "ล" (\u0e25) it works.

I suppose I will have the different or same issue with other alphabet / syllabary (Cyrillic, Chinese, Japanese, ...).
This is why I wanted more precision about what is allowed / not allowed and then create a filter function to remove every "forbidden" characters.

So far I use this function:

import unicodedata
# Function to return Valid Tag name
#	Input: Original Name (Unicode)
#	Output: Normalized name (Unicode)
def NormalizeName(OriginalName):
	print "OriginalName: ", OriginalName, " Type: ", type(OriginalName)
	MyNormalizeName = ''		# Init Normalize Name 
	for characters in unicodedata.normalize('NFD', OriginalName): 
		print characters, " -> ", unicodedata.category(characters)
		if unicodedata.category(characters) != 'Mn':	# Filter on Category Mn (Nonspacing mark)
			MyNormalizeName += characters				# Concatenate ONarmalize name
			#print MyNormalizeName
	print "MyNormalizeName: ", MyNormalizeName, " Type: ", type(MyNormalizeName)
	return MyNormalizeName

The problem with this one is I remove every diacritic (Diacritic), so it is less permissive than what Ignition allow...

victordcq · December 8, 2021, 10:57am

just avoid any wierd characters like that, maybe some work for ignition but they will cause trouble else where for sure xd

pascal.fragnoud · December 8, 2021, 11:13am

I agree with victor here, I’d use only [a-z][A-Z][0-9] and _ ( "^[a-zA-Z0-9_]*$") for tag names.
Now, it would be cool to have a displayName prop on tags, that would be used for things like tag browsers, that would accept any character.

pturmel · December 8, 2021, 1:26pm

Tags have a standard documentation property that can contain anything you like.

francois.beduneau · December 8, 2021, 1:48pm

Thank you guys,
In some way I agree with you, it would be much more easier to limit the tag name to these characters. But unfortunately I don’t create an application for me and for some user it won’t make sense of not using a local language, in addition it’s a pity to not use the flexibility of ignition on tags naming.

Kevin.Herron · December 8, 2021, 2:46pm

The tag naming is ultimately enforced via regular expression that looks for characters in the unicode letter categories (p{\L}), which unfortunately means multi-byte character sequences that contain a letter and then characters in a non-letter unicode category (u0e35 and u0e48 are both “nonspacing mark”) will not match the regex and therefore not be considered a valid tag name.

francois.beduneau · December 9, 2021, 9:31am

Great, This is exactly what I wanted to know.
Thank you for your support!

For those interested, here is the filter function to convert a unicode to a valid tag name:

import unicodedata
# Function to return Valid Tag name
#	Input: Original Name (Unicode)
#	Output: Valid tag name (Unicode)
def ToValidTagName(OriginalName):
	AuthoCategory = ["Lu","Ll","Lt","Lm","Lo"]		# Authorized Unicode Category
	AuthoFirstCharct = ["_","0","1","2","3","4","5","6","7","8","9"]	# Authorized Characters (First possition and other)
	AuthoOther = [" ","'","-",":","(",")"] 	# Other Characters Authorized (Except first position). 
	ValidTagName = ''	# Init Tag Name 
	for characters in OriginalName: 
		if unicodedata.category(characters) in AuthoCategory:	# Filter on Category
			ValidTagName += characters
		elif characters in AuthoFirstCharct:	# Filter On characters allowed everywhere
			ValidTagName += characters	
		elif characters in AuthoOther and len(ValidTagName)!=0:	# Filter On character Allowed if not in first possition
			ValidTagName += characters
		else:	
			#print "Not Authorized Characters: ", characters
			pass
	return ValidTagName

victordcq · December 9, 2021, 9:41am

you might want to give it a “replace” character to indicate an unknown character was used

pascal.fragnoud · December 9, 2021, 9:42am

And to avoid ending up with an empty tag name