I'm looking for help with regard to the notation for a regression equation in a repeated measures model with nested data, $\eqref{eq:2}$, and connecting the notation back to my model specification, Model.3, in r.
My starting point is, i.e. I am familiar, the notation used in Wooldridge's Introductory (2013). Wooldridge's notation for the general longitudinal model for both Random Effects and Fixed Effects Estimation is can be written as,
$$ y_{it} = \beta_{0}+\beta_{1}x_{it} + a_{i}+u_{it} \tag{1} \label{eq:1} $$ where individuals are indexed by $i = 1, 2, …, n$ and time is indexed by $t = 1,2, …, T$. The error term is in two parts; $a_i$ an unobserved individual specific component, which captures unobserved, time-constant, factors and $u_{it}$ the idiosyncratic error, capturing unobserved factors that change over time.
In r I've been estimating a Random Effects version of this model using the plm package like this. First some required packages and some data,
# install.packages(c("plm", "lme4", "texreg", "mlmRev"), dependencies = TRUE)
data(egsingle, package = "mlmRev")
the data-set egsingle is a unbalanced panel consisting of 1721 school children, grouped in 60 schools, across five time points. For details see ?mlmRev::egsingle
Some light data management
dta <- egsingle
dta$Female <- with(dta, ifelse(female == 'Female', 1, 0))
Also, a snippet of the relevant data
dta[118:127,c('schoolid','childid','math','year','size','Female')]
#> schoolid childid math year size Female
#> 118 2040 289970511 -1.830 -1.5 502 1
#> 119 2040 289970511 -1.185 -0.5 502 1
#> 120 2040 289970511 0.852 0.5 502 1
#> 121 2040 289970511 0.573 1.5 502 1
#> 122 2040 289970511 1.736 2.5 502 1
#> 123 2040 292772811 -3.144 -1.5 502 0
#> 124 2040 292772811 -2.097 -0.5 502 0
#> 125 2040 292772811 -0.316 0.5 502 0
#> 126 2040 293550291 -2.097 -1.5 502 0
#> 127 2040 293550291 -1.314 -0.5 502 0
Now, here’s how I would specify the Random Effects Model in r, ignoring the schoolid, based on $\eqref{eq:1}$, using plm() and estimating with FGLS,
library(plm)
Model.1 <- plm(math~Female+size+year, dta, index = c("childid", "year"), model="random")
# summary(reg.re.plm)
However, as mentioned at the top, the data is also nested. That is, childid is nested in schoolid. To write this regression equation I've simply extended $\eqref{eq:1}$ by adding a school-subscript, $s$,
$$ y_{ist} = \beta_{0}+\beta_{1}x_{ist} + a_{i}+\nu_{s}+u_{ist} \tag{2} \label{eq:2} $$ now $y$, $x$, and the idiosyncratic error, $u$, is extended with a $s$ dimension, and the combined error, that in $\eqref{eq:1}$ consist of two parts, is in $\eqref{eq:2}$ extended by a term, $\nu_{s}$. This term then captures the unobserved group/school specific component. I am not confident that this specification is correct. I might be confused by the differences in jargon across the literature.
Part 1 Is $\eqref{eq:2}$ a correct way to specify a regression equation for repeated measures random effects model with a nested structure? Any authoritative literature that use notation similar to this?
This next part, Part 2, is no longer that relevant.
I have tried finding a way to estimate what I believe is $\eqref{eq:2}$ using plm, but I haven't succeeded in that. Part 2 Is it possible to estimate a repeated measures random effects model with a nested structure using the plm package?Based on this question I believe this part is answered by a yes it is estimate to estimate a _repeated measures random effects model with a nested structure_ using the plm package, see the the question linked above
I have estimated, after studding this great answer by Robert Long, a repeated measures model, with childid nested in schoolid, using the lme4 package. Like this,
dta$year <- as.factor(dta$year)
require(lme4)
As the lme4 package is relying on a likelihood framework I begin by estimating a model similar to Model.1 above (for later comparison). Like this,
Model.2 <- lmer(math ~ Female + size + year + (1 | childid), dta)
Now, relying on Robert Long's answer I've specified the nested model like this,
Model.3 <- lmer(math~Female+size+year+(1| schoolid /childid), dta)
Assuming Model.3 is correct specified.
Part 3.a What authoritative source do you recommend, preferably with notation similar to Wooldridge (2013), that presents and discuss the notation for the regression equations for what I am estimating in
Model.3?Part 3.b Is $\eqref{eq:2}$ actually what I am estimating in
Model.3?
Below is the actual estimation results form the three models,
# require(texreg)
texreg::screenreg(list(Model.1, Model.2, Model.3), digits = 3)
#> =============================================================================
#> Model 1 Model 2 Model 3
#> -----------------------------------------------------------------------------
#> (Intercept) -2.671 *** -2.669 *** -2.693 ***
#> (0.085) (0.086) (0.152)
#> Female -0.025 -0.025 0.008
#> (0.046) (0.047) (0.042)
#> size -0.000 *** -0.000 *** -0.000
#> (0.000) (0.000) (0.000)
#> year-1.5 0.878 *** 0.876 *** 0.866 ***
#> (0.059) (0.059) (0.059)
#> year-0.5 1.882 *** 1.880 *** 1.870 ***
#> (0.059) (0.058) (0.058)
#> year0.5 2.575 *** 2.574 *** 2.562 ***
#> (0.059) (0.059) (0.059)
#> year1.5 3.149 *** 3.147 *** 3.133 ***
#> (0.060) (0.059) (0.059)
#> year2.5 3.956 *** 3.954 *** 3.939 ***
#> (0.060) (0.060) (0.060)
#> -----------------------------------------------------------------------------
#> R^2 0.735
#> Adj. R^2 0.735
#> Num. obs. 7230 7230 7230
#> AIC 16855.629 16590.715
#> BIC 16924.489 16666.461
#> Log Likelihood -8417.815 -8284.357
#> Num. groups: childid 1721
#> Var: childid (Intercept) 0.857
#> Var: Residual 0.334 0.334
#> Num. groups: childid:schoolid 1721
#> Num. groups: schoolid 60
#> Var: childid:schoolid (Intercept) 0.672
#> Var: schoolid (Intercept) 0.180
#> =============================================================================
#> *** p < 0.001, ** p < 0.01, * p < 0.05
mixed-model repeated-measures nested-data r lme4-nlme plm
Wooldridge, Jeffrey M. (2013). Introductory Econometrics: A Modern Approach. 5th edition. South-Western College, 2013. isbn: 9781285414645. url: https://www.cengage.co.uk/books/9781111531041/
Female + sizein your models but this is missing from your formulas (1) and (2). – amoeba Mar 16 '18 at 17:15Femaleandsize. Do you happen to know if (2) can be estimated usingplmfrom theplmpackage? – Eric Fail Mar 16 '18 at 17:40plm. – amoeba Mar 16 '18 at 17:50Model 3are indeed the same. Is that your entire question ...? – Ben Bolker Mar 17 '18 at 19:52amoeba’s comment, answers almost the entire question. I am still looking for a textbook presentation of (2), preferably something that follows the notation style used above - or what source you would recommend? cf. Part 3.a above. A source that covers your point about $\beta_1$, more about how what is transposed, what assumptions change, what misspecification test is recommended, and related. Again, I very much appreciate you take the time to comment! – Eric Fail Mar 17 '18 at 20:36