3

This may well be a textbook question - I am not a statistician and just need to solve a practical problem - I will be grateful for any pointers to literature.

There is a $M\times N$ matrix $P$ defined by (matrix) multiplication of vectors $X$ and $Y$ (with lengths $M$ and $N$); that is, $P=XY'$. Here $X$, $Y$, and $P$ are hidden.

There is another $M\times N$ matrix $T$, with values i.i.d. (normally or uniformly, not very important). $T$ is observed.

$M\times N$ binomial experiments are conducted, using $P$ as probability of success, and $T$ for number of Bernoulli trials. The resulting matrix of numbers of success $S$ is observed.

The goal is estimating $P$ (alternatively, estimating $X$ and $Y$ up to multiplicative constants).

A sloppy method of just dividing marginal sums of $S$ by marginal sums of $T$ to arrive at estimates of $X$ and $Y$ (up to a constant) works well for tight distributions of $T$ (or if rank of $T$ is otherwise close to $1$).

I thought that solving a system of $M\times N-1$ "almost-linear" equations (with elements of $T$ serving as coefficients, elements of $X$ and $Y$ as variables, and marginal sums of $S$ as constants - I can elaborate if their form is not obvious) will give better estimates for $X$ and $Y$. Turned out, the estimates are often worse than the "sloppy" ones.

How should I exploit the dependency structure among elements of $P$? Is hierarchical Bayesian the best approach here? $P$ are usually small, so despite $T$ being of the order of thousands, typical $S$ elements are rather small ($0$ is not uncommon, though not that usual). I understand this affects accuracy of estimates dramatically, but I still believe I am not extracting the fullest accuracy possible. Oh, and both $M$ and $N$ are in the range $10-100$.

Glen_b
  • 282,281
Andris Birkmanis
  • 496
  • 2
  • 12
  • 1
    The elements of $T$ can't have a continuous distribution (like the normal), since it only takes integer values. To be honest it looks to me like perhaps a two-way main-effects loglinear model (simply conditioning on $T$) should do what you're after (I presume, give estimates of $X$ and $Y$) – Glen_b Sep 17 '13 at 01:05
  • You are right, Poisson would be more appropriate for $T$ in my application. – Andris Birkmanis Sep 17 '13 at 12:46

1 Answers1

3

Here's a toy example of what I was talking about in my comment.

Since $X$ and $Y$ are only defined up to a multiplicative constant, I will take a multiplicative constant out so as to take the first component of $X$ and $Y$ each to be exactly 1; this makes the model identifiable.

That is, let $P = cXY'$, where $X[1]=Y[1]=1$.

Here's my data:

$S$:

12    2    9
 9    1    5
 2    2    5
 4    0    2
 7    1    6

$T$:

34   54   34
43   41   40
48   45   40
43   41   45
48   46   30

What we will do is use a binomial generalized linear model (glm) to identify $P$, $X$ and $Y$.

I will work in R. I assume the $S$ and $T$ data arrays have been read in to S and T. First set up the data as vectors:

 t <- c(T)
 s <- c(S)
 x <- as.factor(c(row(T))) # set up the predictors for row and column effects
 y <- as.factor(c(col(T)))

Now we estimate a loglinear binomial model:

 mfit2 <- glm(cbind(s,t-s)~x+y,family=binomial(link=log)) # fit the model
 summary(mfit2)  #(give information about the fit -- some output omitted here)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.1385     0.2081  -5.471 4.49e-08 ***
x2           -0.5913     0.2947  -2.006 0.044803 *  
x3           -1.1837     0.3658  -3.236 0.001213 ** 
x4           -1.5569     0.4322  -3.602 0.000316 ***
x5           -0.6255     0.3031  -2.064 0.039011 *  
y2           -1.8917     0.4300  -4.399 1.09e-05 ***
y3           -0.1148     0.2311  -0.497 0.619378    
---
Null deviance: 57.4824  on 14  degrees of freedom
Residual deviance:  7.8047  on  8  degrees of freedom
AIC: 64.464

Pull out the information we are interested in:

#extract X, Y and scaling constant
 X <- exp(c(0,mfit2$coefficients[2:5]))
     Y <- exp(c(0,mfit2$coefficients[6:7]))
 C <- exp(mfit2$coefficients[1])

Look at the fitted proportions:

#table of fitted proportions (estimated P) given by glm:
 matrix(fitted(mfit2),nr=5)
           [,1]       [,2]       [,3]
[1,] 0.32030801 0.04830964 0.28556677
[2,] 0.17732488 0.02674457 0.15809187
[3,] 0.09806341 0.01479016 0.08742726
[4,] 0.06751972 0.01018349 0.06019639
[5,] 0.17135709 0.02584450 0.15277136

 # compare with direct calculation of the fitted propns from coefficients:
 C*outer(X,Y)
                      y2         y3
   0.32030801 0.04830964 0.28556677
x2 0.17732488 0.02674457 0.15809187
x3 0.09806341 0.01479016 0.08742726
x4 0.06751972 0.01018349 0.06019639
x5 0.17135709 0.02584450 0.15277136

Now compute the fitted counts:

 #fitted counts (fitted E(S) given by the model; compares well with S):
 C*outer(X,Y)*T
                    y2       y3
   10.890472 2.6087204 9.709270
x2  7.624970 1.0965275 6.323675
x3  4.707044 0.6655574 3.497090
x4  2.903348 0.4175231 2.708838
x5  8.225140 1.1888468 4.583141

The estimates of the multiplicative effects:

 #Here are the estimated C, X (row effects) and Y (column effects)
 C;X;Y
(Intercept) 
   0.320308 
                 x2        x3        x4        x5 
1.0000000 0.5536074 0.3061535 0.2107962 0.5349760 
                 y2        y3 
1.0000000 0.1508224 0.8915380 

You could allocate the effect of $c$ to $X$ and $Y$ in any proportions you like by taking $c=c_x.c_y$ and letting $X^*=c_x X$ and $Y^*=c_y Y$.

As you see, fitting this stuff is rather straightforward. It can be done in just about any decent stats package.

Relevant literature would be any introductory material on glms, though most material on binomial glms will use logit-link (or occasionally probit or complementary log-log) rather than log-link. However, once you have the basics, understanding how things work with other links is pretty straightforward.

Glen_b
  • 282,281