To begin, I want to address some efficiency concerns with your code. First, you already instantiated the proper Treatment and Post indicators to a perform the "classical" difference-in-differences (DiD) analysis. It is not necessary to include the additive terms and also manually calculate the interaction inside of I(). Simply interact Treatment and Post and R will estimate the constituent terms of the interaction—for free. You could also drop effects = "individual" since the default behavior of plm() is to introduce individual effects. Thus, you can achieve the same results with the following:
OLS <- lm(log(Lum) ~ Treatment*Post + Unemp + Illiteracy, data = Long)
FE <- plm(log(Lum) ~ Treatment*Post + Unemp + Illiteracy, index = c("Municipality"), data = Long, model= "within")
My issue now is that if I perform a DiD regression on luminosity, I get identical coefficients regardless of whether I include individual FE or not.
It is entirely plausible to observe similar results when your panel consists of two well-defined treatment/control groups and two well-defined before/after periods. Thus, you can reestimate the former equation using municipality fixed effects and the interaction term (i.e., Treatment*Post) in the same model. This should yield identical DiD coefficients.
It should be noted, however, that the two equations will not produce similar results once you deviate from this setting. For example, suppose treatment adoption was staggered across municipalities; some municipalities get treated in earlier time periods while others get treated in later ones. Or, suppose a subset of treated municipalities withdraw from treatment permanently while others do not. Or, suppose municipalities move into and out of a "treated" status multiple times. Or, suppose you observe multiple treatment groups and each group receives a different "dose" (intensity) of treatment. You can clearly see the many departures from the traditional setting. In more complex settings, there is no guarantee the results from these two models will be similar. But I digress.
To help with your intuition, I encourage you to estimate the two models below. The former model uses lm() to estimate your DiD equation with municipality fixed effects; the latter uses plm() to estimate the same equation but uses the within transformation. Because treatment is standardized, both will produce identical estimates:
OLS <- lm(log(Lum) ~ as.factor(Municipality) + Treatment*Post + Unemp + Illiteracy, data = Long)
FE <- plm(log(Lum) ~ as.factor(Municipality) + Treatment*Post + Unemp + Illiteracy, index = c("Municipality"), data = Long, model= "within")
Note, as.factor(Municipality) results in the estimation of dummies for all municipalities. This is algebraically equivalent to estimation is deviations from means. In the former equation using lm(), Treatment will be dropped, showing up as NA in your output summary; it is collinear with the fixed effects. This should not concern you. Its removal does not affect the coefficient on your interaction term.
In the latter equation, the inclusion of these dummies is redundant, as you already specified model = within. Your model summary using plm() will be much cleaner. In fact, the output from summary(FE) will omit Treatment for you. Either way, your estimates will be similar.
Addressing concerns in the comments section...
So as I understand it, my FE estimation is a generalization of my OLS estimation?
Correct.
But you write that it's not surprising that coefficients are similar. However, mine are identical, does that make a difference?
No.
Also, the fact that FE & OLS estimates differ if a number of observations are left out randomly confuses me. Is it because then the number of treated months is not identical for all units/treatment does not start at the same time for every municipality?
Yes. You indicated in your post that you randomly removed 10% of your observations (rows). This could remove relevant pre- or post-exposure months. In your full panel, all municipalities might have two post-treatment months. In the abridged panel, a subset of municipalities might have only one—or none. Deliberately discarding observations at random creates an unbalanced panel, and could most certainly impact your point estimates.
Other thoughts...
Your model also generalizes to an equation which includes dummies for all units and all time periods. In lm(), simply include a full set of dummies for municipalities and full set dummies for months.
The following equations should produce similar results as well:
OLS <- lm(log(Lum) ~ as.factor(Municipality) + as.factor(Month) + Treatment*Post + Unemp + Illiteracy, data = Long)
FE <- plm(log(Lum) ~ Treatment*Post + Unemp + Illiteracy, index = c("Municipality", "Month"), data = Long, model= "within", effects = "twoways")
Suppose you have 30 municipalities observed over 36 months. The former model using lm() results in the estimation of 29 separate municipality effects and 35 separate month effects. Here, Treatment is dropped as it is collinear with the municipality fixed effects; Post is dropped as it is collinear with the month fixed effects.
The latter model using plm() does this in one shot if you include the unit and time index; this will require you to create a month variable in R (e.g., Jan-2018, Feb-2018, etc.). If you add effects = "twoways" to your model then a summary will only display the coefficient on your interaction term with the constituent terms removed for free.
Try out these different specifications and see if they give you the same DiD estimate. They should!
lm(log(Lum) ~ as.factor(Municipality) + Treatment*Post + Unemp + Illiteracy, data = Long). Do you get similar results? Note,as.factor(Municipality)represents municipality fixed effects. It is absorbing your treatment dummy. You often read in texts that difference-in-differences is a "special case" of fixed effects. If treatment timing is standardized, then it is not surprising that you are getting similar results, even with some covariates. Let me know if this makes sense and I can elaborate further. – Thomas Bilach Jul 26 '20 at 19:40Thanks for pointing me to your previously answered question! So as I understand it, my FE estimation is a generalization of my OLS estimation? But you write that it's not surprising that coefficients are similar. However, mine are identical, does that make a difference?
– lason Jul 27 '20 at 07:25