2

I am looking for a C++/Java implementation for computing the perplexity of held-out document in Latent Dirichlet allocation. Can anybody suggest useful links?

2 Answers2

1

I did some googling about UMass MALLET (Java) library.

You can use its functions to calculate the log probability of each document, $log(p(\mathbf{w}_d))$, in your hold-out set, and then from that you can easily calculate perplexity according to the formula from the LDA paper:

$$ perplexity(D_{hold out}) = \exp({-\frac{\sum_{d=1}^{M} log(p(\mathbf{w}_d))}{\sum_{d=1}^{M} N_d}}) $$

where $M$ is the # of documents, $N_d$ is the number of words or tokens per document, and $\mathbf{w}_d$ is the sequence of words in document $d$.

I found a forum post that describes some of the steps to doing this: http://t3527.ai-mallet-development.aitalk.info/model-perplexity-t3527.html

Hope that helps!

Victor Ma
  • 156
0

You might find this link useful - it's a good place to start searching for a suitable library, since many are listed here.