Is there a way I can access just the vocabulary list of pre-trained vectors for word2vec and GloVe? I do not need the entire n-dimensional embeddings.
Asked
Active
Viewed 79 times
1 Answers
3
For the word2vec models, you probably can load them with the gensim package and access the vocabulary using wv.vocab property.
Like this:
from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format(filename, binary=True)
words = model.wv.vocab
where filename is the path to the pretrained model.
binary should be False if the pretrained model is in a text representation.
Max Ionov
- 611
- 5
- 6