3

I am simulating data from a basic mixed effects model $$ y_{it}=\alpha_i+\beta_t+\gamma x_{it}+\varepsilon_{it}. $$ I then estimate a corresponding fixed effects model on the simulated data. I get good estimates for individual fixed effects but poor estimates for time fixed effects. Why could that be?

(Actually, the estimates of fixed effects are all biased. I am not worried about that as I think the location is not identified.)

enter image description here


R code:

#---------- Simulation

Set the number of individuals and the number of time periods:

n=100; T=100

Generate parameter values:

set.seed(1); alphas=runif(n=n,min=-1,max=1) set.seed(2); betas =runif(n=T,min=-1,max=1) gamma =1

Generate covariate x and error term eps

set.seed(3); x =rnorm(n=nT); X =matrix(x ,nrow=n,byrow=TRUE) set.seed(4); eps =rnorm(n=nT); Eps=matrix(eps,nrow=n,byrow=TRUE)

Obtain the dependent variable y

Y=matrix(NA,ncol=T,nrow=n) for(i in 1:n){ for(t in 1:T){ Y[i,t]=alphas[i]+betas[t]+gamma*X[i,t]+Eps[i,t] } } y=c(t(Y))

Create identifiers of individuals (obj) and time periods (time)

obj =c(1,rep(0,T-1)); obj =rep(obj,n); obj =cumsum(obj) time=c(1:T); time=rep(time,n)

#---------- Fixed effects estimation

Estimate by accounting for the individual heterogeneity using a FE model

m3=lm(y~-1+factor(obj)+factor(time)+x) summary(m3)

Extract estimates of individual and time fixed effects

alphas_hat3=m3$coef[1:n] betas_hat3 =m3$coef[(n+1):(n+T)] cor(alphas,alphas_hat3)^2 cor(betas, betas_hat3 )^2

Plot them against the true parameter values

dev.new(); mar1=c(4,4,3,0.5); par(mfrow=c(2,1),mar=mar1) plot(x=alphas,y=alphas_hat3,xlab="true",ylab="fitted",main="Individual fixed effects"); abline(a=0,b=1) plot(x=betas ,y=betas_hat3 ,xlab="true",ylab="fitted",main="Time fixed effects" ); abline(a=0,b=1) par(mfrow=c(1,1))

Richard Hardy
  • 67,272

1 Answers1

3

In theory, the columns of the design matrix of m3 would be linearly dependent since the sum of the column vectors identifying the objects and the sum of the the column vectors identifying the time points both equal $\mathbf{1}_{n\cdot T}$. However, lm calls model.matrix internally, which in turn drops one column$-$in this case the one indicating time point 1$-$to generate a design matrix of full column rank. But this means that m3 contains the estimated coefficients of a model without $\beta_1$ (i.e., in which $\beta_1$ is forced to equal zero). The new parameters in terms of the original parameters are given by $$ \tilde{\alpha}_i = \alpha_i + \beta_1 \;\;\forall\, i \in \left\{1,\ldots,n\right\},\\ \tilde{\beta}_t = \beta_t - \beta_1 \;\;\forall\, t \in \left\{2,\ldots,T\right\}. $$ One could therefore consider something like

alphas_hat3 <- m3$coef[1:n] - betas[1]
betas_hat3 <- m3$coef[(n+1):(n+T-1)] + betas[1]
betas <- betas[-1]

which, with your seeds, yields

cor(alphas, alphas_hat3)^2
# 0.984594
cor(betas, betas_hat3)^2
# 0.98608

and the following plot enter image description here

Other seeds also generate plots in which the estimated coefficients are more evenly spread around the identity line.

statmerkur
  • 5,950
  • Perfect, thank you! If you have time, perhaps you could also help me with a related one: https://stats.stackexchange.com/questions/593605? – Richard Hardy Oct 27 '22 at 08:54