1

I want to find 24 topics in 800,000 documents by using LDA model, but how many iterations should I give? It is extremely slow when the parameter is large, like 3000.

Are there any strategies to ensure the stability? Seems giving the iteration a large value is the only way I can think of.

Sean Easter
  • 8,814
  • 2
  • 31
  • 58
hw.fu
  • 11
  • 2
  • Questions pertaining to processing speed are not a great fit for CV. Perhaps what you want to know is something along the lines of how many are required to ensure the stability of result? – gung - Reinstate Monica Sep 11 '13 at 03:12
  • 1
    Yes, I'm sorry for not asking the right answer. Are there any strategies to ensure the stability? Seems giving the iteration a large value is the only way I can think of. – hw.fu Sep 11 '13 at 03:33

2 Answers2

1

You can try getting the logPerplexity per iteration and check on a graph when it converges.

  • 1
    Welcome to the site, @ArjunVariar. This is being automatically flagged as low quality, probably because it is so short. At present it is more of a comment than an answer by our standards. But I think there is a real & valuable answer here. Can you expand on it? We can also turn it into a comment. – gung - Reinstate Monica Mar 17 '16 at 07:19
0

Rather than trying to guess the appropriate number of iterations, you can evaluate the perplexity of the model at every $k$ iterations, and check whether the change is within a chosen tolerance.

For example, the scikit-learn implementation of LDA lets you set this via the evaluate_every and perp_tol parameters. This excerpt form the source code demonstrates how it's done:

if evaluate_every > 0 and (i + 1) % evaluate_every == 0:
    doc_topics_distr, _ = self._e_step(X, cal_sstats=False,
                                        random_init=False,
                                        parallel=parallel)
    bound = self.perplexity(X, doc_topics_distr,
                            sub_sampling=False)
    if self.verbose:
        print('iteration: %d, perplexity: %.4f'
                % (i + 1, bound))

    if last_bound and abs(last_bound - bound) < self.perp_tol:
        break
    last_bound = bound
self.n_iter_ += 1
Sean Easter
  • 8,814
  • 2
  • 31
  • 58