0

Suppose that $X_1, X_2, \dots, X_n$ is an iid sample from a categorial distribution with $k\in\mathbb N$ events. That is, the probability that $X$ is equal to some $x\in\{1,2,\dots,k\}$ is given by $$f_p(x) = \prod_{i=j}^kp_j^{\mathsf 1(x=j)},$$ where $\mathsf 1(\cdot)$ denotes the indicator function and $$p = \begin{bmatrix} p_1 \\ p_2 \\ \vdots \\ p_k \end{bmatrix}.$$

The parameter $p$ can be estimated by Maximum Likelihood subject to the constraint $1_k \boldsymbol{\cdot} p = 1$. Here $1_k$ denotes a vector with $k$ ones and $\boldsymbol{\cdot}$ the usual inner product. Let $$n = \begin{bmatrix} n_1 \\ n_2 \\ \vdots \\ n_k \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^n\mathsf 1(x=1) \\ \sum_{i=1}^n\mathsf 1(x=2) \\ \vdots \\ \sum_{i=1}^n\mathsf 1(x=k) \end{bmatrix}.$$

Then the objective function is given by $$ L(p) = n\boldsymbol{\cdot}\log(p) + \lambda_1(1_k\boldsymbol{\cdot}p - 1).$$ The log operation is understood component-wise.

It's easy to show that the MLE of $p$ is $$\hat p = (1_k\boldsymbol{\cdot} n)^{-1}n.$$

I now want to an additional linear restriction $Ap = \delta$ for some $A\in\mathbb R^{q\times k}$ with $\operatorname{rank}(A) = q$ and $\delta\in\mathbb R^q$. The objective function in this case is $$L(p) = n\boldsymbol{\cdot}\log(p) + \lambda_1(1_k\boldsymbol{\cdot}p - 1) + \lambda_2(Ap - \delta).$$ If $$\tilde A = \begin{bmatrix} 1' \\ A \end{bmatrix},\quad \tilde \delta = \begin{bmatrix} -1 \\ \delta\end{bmatrix},\quad\text{and}\quad \lambda = \begin{bmatrix} \lambda_1 \\ \lambda_2 \end{bmatrix},$$ the objective function can be written as $$L(p) = n\boldsymbol{\cdot}\log(p) + \lambda\boldsymbol{\cdot}(\tilde Ap - \tilde\delta).$$

The objective function now looks like it's easy to handle. However, I don't know how to proceed and derive the MLE with the additional restriction. Is it even possible to find a closed-form expression for the solution of this problem?

  • 1
    When you introduce the Lagrange multipliers the equations are all linear, giving straightforward solutions. For a simple (but non-trivial) example see https://stats.stackexchange.com/a/576957/919. There, $n=3$ and the constraint is $p_1-p_2=0.$ – whuber Jun 03 '22 at 16:25
  • There is no closed form solution for the MLE is this case. A standard approach is to use gradient descent on the magnitude of the gradient of the Lagrangian (the critical points of the Lagrangian itself are saddle point, so applying gradient descent directly on the Lagrangian will not work). @whuber the equations are linear in $\lambda$ but not in $p$ – J. Delaney Jun 03 '22 at 16:59
  • @J.Delaney It looks to me like the system is linear in $1/p$ and in $\lambda.$ Did I misunderstand something? – whuber Jun 03 '22 at 17:23
  • @whuber $p$ and $\lambda$ are vectors here, so you have something like $p_j = n_j / \sum_k A_{jk}\lambda_k$ which results in a non linear equation for the $\lambda$'s – J. Delaney Jun 03 '22 at 17:51
  • @J.Delaney Yes, of course they are vectors. But it looks like the gradient of the log likelihood is linear in $1/p$ and, of course, the gradients of the constraints w.r.t. $p$ are constants. Introducing the Lagrange multipliers gives a system of linear equations in the $1/p_i$ and the $\lambda_j.$ I see now that what makes the problem nonlinear is the constraints themselves are linear in the $p_i,$ not in the $1/p_i.$ – whuber Jun 03 '22 at 17:57
  • Thank you for the replies. So the final answer is that there is no closed-form solution, correct? – Quertiopler Jun 09 '22 at 10:02

0 Answers0