I am trying to understand how multinomial logistic regression works. In the first round, I am using the following model representation:
$$
P(Y=k|X_1=x_1,X_2=x_2)=\frac{\exp(\beta_{k0}+\beta_{k1}x_2+\beta_{k2}x_2)}{\color{red}1+\sum_{j=\color{red}2}^K \exp(\beta_{j0}+\beta_{j1}x_2+\beta_{j2}x_2)}
$$
where $k$ is the category of $Y$, and it can be any value between $1$ and $K$.
I simulate data according to the model and then estimate the model with the R function multinom from the nnet package. Unfortunately, I am getting parameter estimates that are far away from the true values.
The simulation code in R for an examples with 3 categories and two independent variables is given below.
# An auxiliary function that takes regressors and coefficients and produces probabilities according to the multinomial logit model
mnlogitgen1=function(X,Beta){
if(is.null(dim(X))) X=t(X)
p=cbind(1,exp(X%*%Beta))/(1+rowSums(exp(X%*%Beta)))
return(p)
}
# Set parameter values of the data generating process (DGP)
n=1e4 # sample size
beta10= 0.5; beta11=2; beta12=-2; beta1=c(beta10,beta11,beta12)
beta20=-0.5; beta21=2; beta22=-2; beta2=c(beta20,beta21,beta22)
beta30= 0.5; beta31=1; beta32= 1; beta3=c(beta30,beta31,beta32)
# Generate random variables x1, x2
set.seed(1); x1=rnorm(n); set.seed(2); x2=rnorm(n)
# Generate the probabilities
p=mnlogitgen1(X=cbind(1,x1,x2),Beta=cbind(beta2,beta3))
# Generate an auxiliary random variable u that is uniformly distributed between 0 and 1
set.seed(0); u=runif(n,min=0,max=1)
# Determine categories of Y based on probabilities
u1=which(u<=p[,1])
u2=which((u>p[,1])&(u<=p[,1]+p[,2]))
u3=which(u>p[,1]+p[,2])
# Generate y
y=rep(NA,n)
y[u1]=1; y[u2]=2; y[u3]=3
# Encode y as factor to reflect the numbers mean categories
y=factor(y)
And here is the fitting bit:
# Fit a multinomial logistic regression
library(nnet)
m1=multinom(y~x1+x2)
print(m1)
with the following result:
> m1
Call:
multinom(formula = y ~ x1 + x2)
Coefficients:
(Intercept) x1 x2
2 -0.5247573 1.9761060 -1.937964
3 0.5011046 0.9772012 1.028550
The estimated coefficients turn out to be similar to beta2 and beta3 -- so far so good.
In the second round, I try another representation of the multinomial logit model: $$ P(Y=k|X_1=x_1,X_2=x_2)=\frac{\exp(\beta_{k0}+\beta_{k1}x_2+\beta_{k2}x_2)}{\sum_{j=\color{red}1}^K \exp(\beta_{j0}+\beta_{j1}x_2+\beta_{j2}x_2)}. $$ In this case I swap the line
p=mnlogitgen1(X=cbind(1,x1,x2),Beta=cbind(beta2,beta3))
with the line
p=mnlogitgen2(X=cbind(1,x1,x2),Beta=cbind(beta1,beta2,beta3))
and the function mnlogitgen1 with mnlogitgen2:
mnlogitgen2=function(X,Beta){
if(is.null(dim(X))) X=t(X)
p=exp(X%*%Beta)/rowSums(exp(X%*%Beta))
return(p)
}
When I fit the model, the coefficients do not resemble the true values:
> m1
Call:
multinom(formula = y ~ x1 + x2)
Coefficients:
(Intercept) x1 x2
2 -1.03717237 -0.008854468 -0.00652646
3 0.01580405 -0.973528437 2.91347034
Questions:
- What mistake am I making?
- How should I modify the procedure to obtain estimates close to the true parameter values?