1

I am looking for derivation of eqn 5 in C.Moody's paper https://arxiv.org/pdf/1605.02019.pdf where it says the loss function coming from dirichlet enforcement of sparsity is

$L^d=\lambda\sum_{jk}(\alpha-1)logp_{jk}$

When you look up a formal paper, such as https://tminka.github.io/papers/dirichlet/minka-dirichlet.pdf eqn 4

that term is only part of the whole formula. Can someone explain how the Moody is able to eliminate the two Gamma terms?

bhomass
  • 557

1 Answers1

2

Equation $4$ from Minka's paper is

$$\log P(D|\alpha) = N\log\Gamma\left( \sum_k \alpha_k \right)-N\sum_k\log\Gamma\left( \alpha_k \right)+N\sum_k (\alpha_k - 1)\log \bar{p_k}$$

If $\alpha$ is fixed and the optimization variables are $\bar{p_k}$, the first two terms are just constant and they can be dropped for the purpose of such optimization, after all.