I have a list of multiword expressions that are stored inside a pandas series with this format ('first word', 'second word') and I want to add them all to "MWETokenizer" following this post How to treat a phrase containing stopwords as a single token with Python nltk.tokenize
.. so I added the first element in the list then I tried to iterate over the list to add the rest...
here's the code
from nltk.tokenize import MWETokenizer
mwetokenizer = MWETokenizer([('bite' ,'bullet')], separator='_')
size = len(MWE_series)-1
i = 1
for line in range(size):
mwetokenizer.add_mwe((MWE_series[i]))
i+=1
I don't get error and the code works, but only add the first MWE ('bite' ,'bullet') and ignore the rest inside the loop mwetokenizer.add_mwe((MWE_series[i])). how can I fix this problem?