3

Is there a way I can access just the vocabulary list of pre-trained vectors for word2vec and GloVe? I do not need the entire n-dimensional embeddings.

Adam_G
  • 576
  • 3
  • 16

1 Answers1

3

For the word2vec models, you probably can load them with the gensim package and access the vocabulary using wv.vocab property. Like this:

from gensim.models.keyedvectors import KeyedVectors

model = KeyedVectors.load_word2vec_format(filename, binary=True)
words = model.wv.vocab

where filename is the path to the pretrained model. binary should be False if the pretrained model is in a text representation.

Max Ionov
  • 611
  • 5
  • 6