This may well be a textbook question - I am not a statistician and just need to solve a practical problem - I will be grateful for any pointers to literature.
There is a $M\times N$ matrix $P$ defined by (matrix) multiplication of vectors $X$ and $Y$ (with lengths $M$ and $N$); that is, $P=XY'$. Here $X$, $Y$, and $P$ are hidden.
There is another $M\times N$ matrix $T$, with values i.i.d. (normally or uniformly, not very important). $T$ is observed.
$M\times N$ binomial experiments are conducted, using $P$ as probability of success, and $T$ for number of Bernoulli trials. The resulting matrix of numbers of success $S$ is observed.
The goal is estimating $P$ (alternatively, estimating $X$ and $Y$ up to multiplicative constants).
A sloppy method of just dividing marginal sums of $S$ by marginal sums of $T$ to arrive at estimates of $X$ and $Y$ (up to a constant) works well for tight distributions of $T$ (or if rank of $T$ is otherwise close to $1$).
I thought that solving a system of $M\times N-1$ "almost-linear" equations (with elements of $T$ serving as coefficients, elements of $X$ and $Y$ as variables, and marginal sums of $S$ as constants - I can elaborate if their form is not obvious) will give better estimates for $X$ and $Y$. Turned out, the estimates are often worse than the "sloppy" ones.
How should I exploit the dependency structure among elements of $P$? Is hierarchical Bayesian the best approach here? $P$ are usually small, so despite $T$ being of the order of thousands, typical $S$ elements are rather small ($0$ is not uncommon, though not that usual). I understand this affects accuracy of estimates dramatically, but I still believe I am not extracting the fullest accuracy possible. Oh, and both $M$ and $N$ are in the range $10-100$.