Iteration parameter in latent dirichlet allocation model

Question

I want to find 24 topics in 800,000 documents by using LDA model, but how many iterations should I give? It is extremely slow when the parameter is large, like 3000.

Are there any strategies to ensure the stability? Seems giving the iteration a large value is the only way I can think of.

Questions pertaining to processing speed are not a great fit for CV. Perhaps what you want to know is something along the lines of how many are required to ensure the stability of result? — gung - Reinstate Monica, Sep 11 '13 at 03:12
Yes, I'm sorry for not asking the right answer. Are there any strategies to ensure the stability? Seems giving the iteration a large value is the only way I can think of. — hw.fu, Sep 11 '13 at 03:33

score 1 · Answer 1 · answered Mar 17 '16 at 05:33

1

You can try getting the logPerplexity per iteration and check on a graph when it converges.

answered Mar 17 '16 at 05:33

Arjun Variar

11

1

Welcome to the site, @ArjunVariar. This is being automatically flagged as low quality, probably because it is so short. At present it is more of a comment than an answer by our standards. But I think there is a real & valuable answer here. Can you expand on it? We can also turn it into a comment. – gung - Reinstate Monica Mar 17 '16 at 07:19

score 0 · Answer 2 · answered Aug 03 '16 at 20:16

Rather than trying to guess the appropriate number of iterations, you can evaluate the perplexity of the model at every $k$ iterations, and check whether the change is within a chosen tolerance.

For example, the scikit-learn implementation of LDA lets you set this via the evaluate_every and perp_tol parameters. This excerpt form the source code demonstrates how it's done:

if evaluate_every > 0 and (i + 1) % evaluate_every == 0:
    doc_topics_distr, _ = self._e_step(X, cal_sstats=False,
                                        random_init=False,
                                        parallel=parallel)
    bound = self.perplexity(X, doc_topics_distr,
                            sub_sampling=False)
    if self.verbose:
        print('iteration: %d, perplexity: %.4f'
                % (i + 1, bound))

    if last_bound and abs(last_bound - bound) < self.perp_tol:
        break
    last_bound = bound
self.n_iter_ += 1

Iteration parameter in latent dirichlet allocation model

2 Answers2