-2

I'm try to make an inverted index for some NLP to see how many times a word appears in a document. I'm doing this via a dictionary but my output is like this (here the word man appears in documents 1 and 11)

{'man': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11],
 'upon': [1, 1, 1, 3, 3, 3, 1539, 1539, 1539]}

How do I get rid of these duplicate values so I just have

{'man': [1,11], 'upon': [1,3,1539]}
Shane Bishop
  • 2,926
  • 1
  • 11
  • 32
David R
  • 13
  • 5
  • Does this answer your question: https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order? BTW perhaps the best approach is not to create these lists with duplicates in the first place. – Dani Mesejo Oct 24 '21 at 23:13

1 Answers1

2

Just convert values to sets and then back to lists:

my_dict = {k: list(set(v)) for k, v in my_dict.items()}
pavel
  • 2,839
  • 2
  • 21
  • 33