SOLUTION HAS BEEN PROVIDED - Thanks @ekhumoro! I have a python dictionary that contains a list of terms as values:
myDict = {
ID_1: ['(dog|cat[a-z+]|horse)', '(car[a-z]+|house|apple\w)', '(bird|tree|panda)'],
ID_2: ['(horse|building|computer)', '(panda\w|lion)'],
ID_3: ['(wagon|tiger|cat\w*)'],
ID_4: ['(dog)']
}
I want to be able to read the the list-items in each value, as individual regular expressions, and if they match any text- have the matched text returned as keys in a separate dictionary, with their original keys (the IDs) as the values. So if these terms were read as regexes for searching this string:
"dog panda cat cats pandas car carts"
The general approach I have in mind is something like:
For key, value in myDict:
for item in value:
if re.compile(item) = match-in-text:
newDict[match] = [list of keys]
The expected output would be:
newDict = {
car: [ID_1],
carts: [ID_1],
dog: [ID_1, ID_4],
panda: [ID_1, ID_2],
pandas: [ID_1, ID_2],
cat: [ID_1, ID_3],
cats: [ID_1, ID_3]
}
The matched text should be returned as a key in newDict only if they've actually matched something in the body of text. So in the output, 'Carts' is listed there since the regex in ID_1's values matched with it. And therefore the ID is listed in the output dict. SOLUTION
import re
from collections import defaultdict
text = """
the eye of the tiger
a doggies in the manger
the cat in the hat
a kingdom for my horse
a bird in the hand
the cationic cataclysm
the pandamonious panda pandas
"""
myDict = {
'ID_1': ['(dog\w+|cat\w+|horse)', '(car|house|apples)',
'(bird|tree|panda\w+)'],
'ID_2': ['(horse|building|computer)', '(panda\w+|lion)'],
'ID_3': ['(wagon|tiger|cat)'],
'ID_4': ['(dog)'],
}
newDict = defaultdict(list)
for key, values in myDict.items():
for pattern in values:
for match in re.finditer(pattern, text):
newDict[match.group(0)].append(key)
for item in newDict.items():
print(item)