Python search strings with partial match

jlandwerlen · March 9, 2023, 4:36pm

string1 = 'First/Second/Third/Fourth/Fifth'
string2 = 'Second/Third/SomethingElse/Etc'

Is there a native way in Python to search for partial matches stating at any point and ending at any point?

In the examples above there is a partial match of 'Second/Third' between the two strings, then I would want that value returned.

bkarabinchak.psi · March 9, 2023, 4:38pm

What value do you want returned? Just that it's in the string or not, or the character reference to when the substring starts?

jlandwerlen · March 9, 2023, 4:39pm

In the above example, the matching section which is 'Second/Third'.

bkarabinchak.psi · March 9, 2023, 4:41pm

Think you need regular expressions. Be warned Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. They're not really that bad but I never can remember the syntax without looking it up. The jython library is import re. Here's a nice tester I've used when needing to do this stuff, but you will need to look up the syntax yourself or someone else may have to help you - https://regex101.com/

paul-griffith · March 9, 2023, 4:48pm

So far, this is an underspecified question.

How small of a 'partial match' is acceptable?
E.G, both examples have many of the same letters repeating.
Are your inputs always going to be slash-separated? If so, is what you care about actually "2+ tokens in a consistent order"? That's a lot easier (and faster) to solve for than arbitrary string subsequences, and shouldn't require any regex - split on /, organize into n-length tuples, run the (quadratic) comparison operations, brute force.

bkarabinchak.psi · March 9, 2023, 4:50pm

There is the .find() method on strings that could work for you as well, or splitting the string based on '/' and looking at the list. I agree regex should be your last solution but I also am not sure what your full use case here is.

jlandwerlen · March 9, 2023, 5:01pm

Very good point, the smallest chunk is always going to be between a slash, or the beginning and end and a slash. Also, always left to right. I'm really looking for contiguous word matches.

So a possible result would be 'Second' assuming the very next section doesn't match.

I looked at regex, but it always confuses the snot out of me, so was wondering if anyone had something they have done in the past. I will try and see if I can get something to work in the meantime.

jlandwerlen · March 9, 2023, 7:11pm

Best I got.
I really struggled with regex so I found this easy enough.

Are there more pythonic ways, or issues anyone sees with this approach?

string1 = 'First/Second/Third/Fourth/Fifth'
string2 = 'Second/Third/SomethingElse/Etc'
match = [x for x in string1.split('/') if x in string2.split('/')]
string = '/'.join(match)
print string

JordanCClark · March 9, 2023, 7:59pm

Then is this one permissable?

string1 = 'First/Second/Third/Fourth/Fifth'
string2 = 'Second/OOPS/Third/SomethingElse/Etc' # Extra item in string.
match = [x for x in string1.split('/') if x in string2.split('/')]
string = '/'.join(match)
print string

This should return just Second, correct?
Actual Output:

 Second/Third

jlandwerlen · March 9, 2023, 8:11pm

JordanCClark:

string1 = 'First/Second/Third/Fourth/Fifth'
string2 = 'Second/OOPS/Third/SomethingElse/Etc' # Extra item in string.
match = [x for x in string1.split('/') if x in string2.split('/')]
string = '/'.join(match)
print string

Dang, thanks for pointing that out, you are correct. Any suggestions?

JordanCClark · March 9, 2023, 8:31pm

Try this one:

def partialMatch(stringIn1, stringIn2):
	# Split the first string
	string1split = stringIn1.split('/')
	
	previousIndex = None
	listOut = []
	
	for item in stringIn2.split('/'):
		# Try to find an index. Raises valueError if it doesn't exist.
		try:
			index = string1split.index(item)
			# Check if first match or contiguous match
			if previousIndex is None or index - previousIndex == 1:
				previousIndex = index
				listOut.append(item)
			else:
				# Break if non-contiguous index
				break
		except ValueError:
			if previousIndex > -1:
				# Break if no index, and an index was previously found.
				break
	# Join list back together.
	return '/'.join(listOut)
	
# Some tests
string1 = 'First/Second/Third/Fourth/Fifth'
string2 = 'put/more/stuff/at/the/start/Second/Third/SomethingElse/Etc'
print partialMatch(string1, string2)

string2 = 'put/more/stuff/at/the/start/Fourth/Second/Third/SomethingElse/Etc'
print partialMatch(string1, string2)

string2 = 'put/more/stuff/at/the/start/Second/Fourth/Third/SomethingElse/Etc'
print partialMatch(string1, string2)

Output:

Second/Third
Fourth
Second

jlandwerlen · March 9, 2023, 8:35pm

Thanks @JordanCClark. I will give it a try. I'm sure it will work well.

Has anyone used SequenceMatcher?

from difflib import SequenceMatcher

a = "preview"
b = "previeu"
SequenceMatcher(a=a, b=b).ratio()

This looks neat but I didn't see a way to only return the matching portion.

dkhayes117 · March 9, 2023, 8:49pm

I found that when I got nerd sniped by your question. What I got so far...

import difflib
string1 = 'First/Second/Third/Fourth/Fifth'
string2 = 'Second/OOPS/Third/SomethingElse/Etc' # Extra item in string.
s = difflib.SequenceMatcher(None,string1,string2)
blox = s.get_matching_blocks()
print [string1[x.a:x.a+x.size] for x in blox]

['Second/', 'Third/', 'o', 'th', '/', 't', '']

paul-griffith · March 9, 2023, 8:52pm

~~Maybe split them first before passing to sequence matcher? You should be able to pass a list of strings in.~~ difflib is for text sequences specifically, not a generalized thing.

jlandwerlen · March 9, 2023, 9:03pm

By the way, I really didn't explain my actual use case. Please let me know if this is silly....

I am doing a Perspective project and my tags are structured in a very strategic manner, as I do in most of my projects. I created an exchange resource, that is very similar to what other users have done in the past. This resource is a UDT that does alarm summarization based on tag path. I have updated the UDT to include the tag path for highest priority alarm in the path. Next, my goal is to arrange my views folder structure to match the same tag structure. So, on alarm, the user can click on a button for a specific area/zone that will take them to the view with the highest priority. I did something similar on a previous project, but I had to manually input the view in the alarm UDT. That was time consuming. This is automatic, assuming the view and tag paths are done correctly. At the end of the day, if there is no match, the button will default to a main window which allows them to navigate further. This is just a click saver.

I figured it was at least worth the effort to try and see how it would work.

Matthew_Hunt · March 16, 2023, 1:41pm

If you are just if a string contains a string you can do

if "tagpathString" in "paramterString":
logic

You can also use
string.find(x) to find the specific character where the string matches and you can use the integer returned for further processing like splitting the tag path at a specific character etc.

jlandwerlen · March 16, 2023, 10:34pm

Thanks @Matthew_Hunt for the tip, but the methods you described will not work for my use case.