How do I deal with new words in held out documents when evaluating generative models?

Asked Jul 24 '16 at 02:56

Active Jul 24 '16 at 06:06

Viewed 41 times

I'm trying out some different generative models and I want to evaluate them by holding out some documents, training the models the other documents, and then calculating the perplexity on the held out documents.

However, I'm not sure what to do when I have a word in the held out documents that is not in the documents that were trained on. The model with give that word a probability of zero. Is there a standard/good procedure for dealing with this?

edited Jul 24 '16 at 06:06

Glen_b

282,281

asked Jul 24 '16 at 02:56

roundsquare

How do I deal with new words in held out documents when evaluating generative models?

0 Answers0