3

I would like to compute the maximum likelihood estimator of a general two-dimensional discrete probability $p_{ij}$ with $i=1,\ldots,N$ and $j=1,\ldots,N$ given a set of counts $c_{ij}$ and the marginal probabilities $\sum_j p_{ij} = p^{(x)}_i$ and $\sum_i p_{ij} = p^{(y)}_j$. The values $c_{ij}$, $p^{(x)}_i$ and $p^{(y)}_j$ are inputs.

I would like to know if there is a closed form solution to the above problem.

I have worked on the problem using Lagrange multipliers. The objective function is $$ F\left(\{p_{ij}\}, \{\lambda^{(x)}_i\}, \{\lambda^{(y)}_j\}\right) = \sum_{ij} c_{ij} \log p_{ij} - \sum_{i} \lambda^{(x)}_i \left(\sum_j p_{ij} - p^{(x)}_i\right) - \sum_{j} \lambda^{(y)}_j \left(\sum_i p_{ij} - p^{(y)}_j\right). $$ The $p_{ij}$ derivatives result in the equations $p_{ij} = \frac{c_{ij}}{\lambda^{(x)}_i+\lambda^{(y)}_j}$ for all $i,j$. Substituting these equations into the $\lambda$ derivative conditions gives $$ \sum_j \frac{c_{ij}}{\lambda^{(x)}_i+\lambda^{(y)}_j} = p^{(x)}_i \text{ for all $i$} $$ and $$ \sum_i \frac{c_{ij}}{\lambda^{(x)}_i+\lambda^{(y)}_j} = p^{(y)}_j \text{ for all $j$}. $$

I do not know how to solve the above equations except by numerical methods. I would like a closed form if possible.

  • Maximum likelihood estimation must involve a data set and a parametric family of probability distributions, right? What are those two in your problem? And what are $c_{i,j}$s? – Kumara Mar 04 '14 at 11:32
  • This is a general two-dimensional distribution over a finite set of possible outcomes. Let $X$ be a random variable over the set $\Omega = {1,\ldots,N} \times {1,\ldots,N}$. $c_{ij}$ is the number of occurrences of the tuple $(i,j)$ in the random sample of $X$ (this is the data set). The probability family is specified by the probability of each outcome, $\text{Prob}((i,j)) = p_{ij}$. The $p_{ij}$ are constrained by $\sum_i p_{ij} = p^{(x)}j$ for all $j$ and $\sum_j p{ij} = p^{(y)}_i$ for all $i$, where $p^{(x)}_j$ and $p^{(y)}_i$ are given information. – John Jumper Mar 07 '14 at 09:11
  • Ah, I see. Then the problem is same as a problem of minimization of $D(\hat{p}|p)$ over all $p$ satisfying the marginal constraints, because $\sum_{ij} c_{ij} \log p_{ij}=N^2 \sum_x \hat{p}(x) \log p(x)=N^2(-D(\hat{p}|p)+\sum_x \hat{p}(x)\log \hat{p}(x))$. – Kumara Mar 08 '14 at 07:39
  • I haven't seen minimization of KL divergence on the second component over a family determined by marginal constraints. This monograph http://books.google.co.in/books/about/Information_Theory_and_Statistics.html?id=eTXlN0LRiFgC&redir_esc=y by Imre Csiszar (Chapter 4) deals with only minimization of KL divergence on the first component over marginal constraints. I don't know whether this would be helpful to you at all. – Kumara Mar 08 '14 at 07:45
  • I appreciate the reference. Based on Ireland and Kullback (see the answer below), I ended up going with the minimum chi-square estimate with the marginal constraints, instead of maximum likelihood. This give me an optimization problem with a quadratic objective and linear constraints for the marginals and positivity constraints for the $p_{ij}$. This is solvable by standard quadratic programming solvers, which is quite helpful because the exact maximum likelihood equations tended to cause fits for non-linear root finding. – John Jumper Mar 09 '14 at 07:41

1 Answers1

4

I noticed that the problem is discussed extensively in Ireland and Kullback(!) 1968 (http://biomet.oxfordjournals.org/content/55/1/179.short), which also references earlier material. I derived the correct maximum likelihood conditions but according to Ireland, there is no known closed solution. They provide several iterative methods and discussions of other objective functions to solve this problem.