Failing to recover true parameters in simulations of multinomial logit

Question

I am trying to understand how multinomial logistic regression works. In the first round, I am using the following model representation: $$ P(Y=k|X_1=x_1,X_2=x_2)=\frac{\exp(\beta_{k0}+\beta_{k1}x_2+\beta_{k2}x_2)}{\color{red}1+\sum_{j=\color{red}2}^K \exp(\beta_{j0}+\beta_{j1}x_2+\beta_{j2}x_2)} $$ where $k$ is the category of $Y$, and it can be any value between $1$ and $K$. I simulate data according to the model and then estimate the model with the R function multinom from the nnet package. Unfortunately, I am getting parameter estimates that are far away from the true values.

The simulation code in R for an examples with 3 categories and two independent variables is given below.

# An auxiliary function that takes regressors and coefficients and produces probabilities according to the multinomial logit model
mnlogitgen1=function(X,Beta){
 if(is.null(dim(X))) X=t(X)
 p=cbind(1,exp(X%*%Beta))/(1+rowSums(exp(X%*%Beta)))
 return(p)
}

# Set parameter values of the data generating process (DGP)
n=1e4 # sample size
beta10= 0.5; beta11=2; beta12=-2;   beta1=c(beta10,beta11,beta12)
beta20=-0.5; beta21=2; beta22=-2;   beta2=c(beta20,beta21,beta22)
beta30= 0.5; beta31=1; beta32= 1;   beta3=c(beta30,beta31,beta32)

# Generate random variables x1, x2
set.seed(1); x1=rnorm(n); set.seed(2); x2=rnorm(n)

# Generate the probabilities
p=mnlogitgen1(X=cbind(1,x1,x2),Beta=cbind(beta2,beta3))
# Generate an auxiliary random variable u that is uniformly distributed between 0 and 1
set.seed(0); u=runif(n,min=0,max=1)

# Determine categories of Y based on probabilities
u1=which(u<=p[,1])
u2=which((u>p[,1])&(u<=p[,1]+p[,2]))
u3=which(u>p[,1]+p[,2])

# Generate y
y=rep(NA,n)
y[u1]=1; y[u2]=2; y[u3]=3
# Encode y as factor to reflect the numbers mean categories    
y=factor(y)

And here is the fitting bit:

# Fit a multinomial logistic regression
library(nnet)
m1=multinom(y~x1+x2)
print(m1)

with the following result:

> m1
Call:
multinom(formula = y ~ x1 + x2)

Coefficients:
  (Intercept)        x1        x2
2  -0.5247573 1.9761060 -1.937964
3   0.5011046 0.9772012  1.028550

The estimated coefficients turn out to be similar to beta2 and beta3 -- so far so good.

In the second round, I try another representation of the multinomial logit model: $$ P(Y=k|X_1=x_1,X_2=x_2)=\frac{\exp(\beta_{k0}+\beta_{k1}x_2+\beta_{k2}x_2)}{\sum_{j=\color{red}1}^K \exp(\beta_{j0}+\beta_{j1}x_2+\beta_{j2}x_2)}. $$ In this case I swap the line

p=mnlogitgen1(X=cbind(1,x1,x2),Beta=cbind(beta2,beta3))

with the line

p=mnlogitgen2(X=cbind(1,x1,x2),Beta=cbind(beta1,beta2,beta3))

and the function mnlogitgen1 with mnlogitgen2:

mnlogitgen2=function(X,Beta){
 if(is.null(dim(X))) X=t(X)
 p=exp(X%*%Beta)/rowSums(exp(X%*%Beta))
 return(p)
}

When I fit the model, the coefficients do not resemble the true values:

> m1
Call:
multinom(formula = y ~ x1 + x2)

Coefficients:
  (Intercept)           x1          x2
2 -1.03717237 -0.008854468 -0.00652646
3  0.01580405 -0.973528437  2.91347034

Questions:

What mistake am I making?
How should I modify the procedure to obtain estimates close to the true parameter values?

Jesper for President · Accepted Answer · 2019-11-05T12:55:33.370

Consider the second model

$$p_k = \frac{\exp(\beta_{k0} + x_1\beta_{k1} + x_2\beta_{k2})}{\sum_j \exp(\beta_{j0} + x_1\beta_{j1} + x_2\beta_{j2})}$$

Divide all these probabilities with a constant $c$ in numerator and denominator

$$p_k = \frac{\exp(\beta_{k0} + x_1\beta_{k1} + x_2\beta_{k2})/c}{\sum_j \exp(\beta_{j0} + x_1\beta_{j1} + x_2\beta_{j2})/c}$$

trivially getting the same probabilities ... Now let the constant be

$$c = \exp(\beta_{10} + x_1\beta_{11} + x_2\beta_{12})$$ the numerator of the first alternative then the probabilities become

$$p_k = \frac{\exp(\beta_{k0}-\beta_{10} + x_1(\beta_{k1}-\beta_{11}) + x_2(\beta_{k2}-\beta_{12}))}{\sum_j \exp(\beta_{j0}-\beta_{10} + x_1(\beta_{j1}-\beta_{11}) + x_2(\beta_{j2}-\beta_{12}))}$$

which with proper choice of definitions becomes

$$p_k = \frac{\exp(\lambda_{k0} + x_1\lambda_{k1} + x_2\lambda_{k2})}{1+\sum_{j=2}^K \exp(\lambda_{k0} + x_1\lambda_{k1} + x_2\lambda_{k2})}$$

for $k>1$ and then

$$p_1 = \frac{1}{1+\sum_{j=2}^K \exp(\lambda_{k0} + x_1\lambda_{k1} + x_2\lambda_{k2})}$$

The point is that the choice frequencies observed only identifies through the probabilities, so same probabilities gives the same likelihood value, but the probabilities are kept the same when deviding with the clever chosen constant, even though 3 parameters were removed from the model. Hence since the likelihood is the same these parameters are not identified. That is also why the estimation code only gives you estimates of coefficients for TWO alternatives even though there are THREE alternatives in the model.

So the mistake you are maken is to include non-identified parameters in your model (and expect that you can recover them).

You correct the mistake by setting all params of one alternative to 0, which is what you did in the first case.

Let us assume you estimate under standard assumption that $\beta_{10},\beta_{11},\beta_{12}$ are all $0$ but in reality they were not when you simulated then the estimates are (i suspect)

$$\hat\lambda_{k0}=\beta_{k0}-\beta_{10} , \hat \lambda_{k1}=\beta_{k1}-\beta_{11} , \hat \lambda_{k2} =\beta_{k2}-\beta_{12}$$ so just add the parameters of alternative 1 to the estimates to recover original "true" values.

Thanks for a great answer! A quick additional question (disclosure: I am just being too lazy to work out the details myself): in my second approach, do you see a quick way in R of recovering the "canonical" parameters from my specified parameters, so that I can anticipate what R will be estimating and compare the estimates to these "canonical" values? — Richard Hardy, Nov 05 '19 at 12:39

Failing to recover true parameters in simulations of multinomial logit

1 Answers1

Linked