Evaluation of LDA

Question

I am looking for a C++/Java implementation for computing the perplexity of held-out document in Latent Dirichlet allocation. Can anybody suggest useful links?

Victor Ma · Accepted Answer · 2015-05-12T12:02:21.777

I did some googling about UMass MALLET (Java) library.

You can use its functions to calculate the log probability of each document, $log(p(\mathbf{w}_d))$, in your hold-out set, and then from that you can easily calculate perplexity according to the formula from the LDA paper:

$$ perplexity(D_{hold out}) = \exp({-\frac{\sum_{d=1}^{M} log(p(\mathbf{w}_d))}{\sum_{d=1}^{M} N_d}}) $$

where $M$ is the # of documents, $N_d$ is the number of words or tokens per document, and $\mathbf{w}_d$ is the sequence of words in document $d$.

I found a forum post that describes some of the steps to doing this: http://t3527.ai-mallet-development.aitalk.info/model-perplexity-t3527.html

Hope that helps!

score 0 · Answer 2 · answered May 12 '15 at 10:17

0

You might find this link useful - it's a good place to start searching for a suitable library, since many are listed here.

answered May 12 '15 at 10:17

Dominik Filipiak

101

Evaluation of LDA

2 Answers2