3

I am new to multilevel model and having trouble understanding how to include covariates. In my mode, I have Industry and Country as two factors. I have to control for the effects of following two covariates to determine the effect of my focal variable X on Y: IndustryProfitability and CountryRiskRatings. Which of the following would you think is the most appropriate model specification for this purpose?

Model 1 = lmer(Y ~ X + IndustryProfitability + CountryRiskRatings, myData)

Model 2 = lmer(Y ~ X + (1|Industry)+ (1|Country), myData)   

Model 3 = lmer(Y ~ X + (1|Industry)+ (1|Country) + IndustryProfitability + CountryRiskRatings, myData)

I was told that Model 3 is the most appropriate as there is non-independence in my data due to industry and country. But I am confused as the two covariates (e.g., IndustryProfitability) relating to a higher level (e.g., Industry) are included as fixed effects in the Model 3. Are Model 1 and Model 2 better or misspecified?

Robert Long
  • 60,630
  • 1
    What exactly is the question that you are setting out to ask with this exercise? Model three is asking, what is the relationship between your response Y and the covariates X, IndustryProfitability and CountryRiskRating, after removing any influence of the industry/country. Does that sound like the question you want to ask your data? – André.B Feb 14 '19 at 00:52
  • The question that I am interested in whether X predicts Y after controlling for all other factors. I am not particularly interested in whether IndustryProfitability and CountryRiskRating predict Y. But I need to control for them as they are likely to be correlated with X. – SanMelkote Feb 14 '19 at 00:58
  • 1
    Given what you are asking, how can you possibly consider either models 1 or 2? They don't involve half the variables you are talking about. You could ask whether Model 3 is specified correctly, but the other two don't make any sense as alternatives given what you've asked. – Bryan Krause Feb 14 '19 at 01:34
  • 1
    How are your extra factors arranged? Are either IndustryProfitability or CountryRiskRating nested within Industry/Country? What kind of variables (i.e. numeric, factor, etc.) are IndustryProfitability or CountryRiskRating? – André.B Feb 14 '19 at 03:24
  • Industry (10 different types) and Country (90 different types) are dummy codes. Both IndustryProfitability (one value for each industry) and CountryRiskRating (one value for each country) are numeric. Could you please let me know what you mean by nesting of IndustryProfitability or CountryRiskRating within Industry/Country? – SanMelkote Feb 14 '19 at 03:48

1 Answers1

4
>  Model 1 = lmer(Y ~ X + IndustryProfitability + CountryRiskRatings, myData)

This is not a mixed effects model and will not run. It will return an error:

Error: No random effects terms specified in formula

Then we have:

>  Model 2 = lmer(Y ~ X + (1|Industry)+ (1|Country), myData)   

This will fit a linear mixed effects model and will, subject to convergence, estimate the fixed effect of X while accounting for the non-independence of observations within the clusters. However, it will obviously not be adjusted for IndustryProfitability or CountryRiskRatings which is the point of the OP.

So we can disrgard models 1 and 2.

This leaves:

>  Model 3 = lmer(Y ~ X + (1|Industry)+ (1|Country) + IndustryProfitability + CountryRiskRatings, myData)

This indeed will adjust for the 2 covariates. The fact that they vary at the cluster level does not matter - they should be automatically be handled at the correct level. However, I advise a little caution here. If the 2 covariates are a cause, or a proxy of a cause, of both X and Y, and neither are a cause of the other, then they are potential confounders and should be adjusted for. However, if either of them are on the causal pathway from X to Y, then they are mediators and should not be adjusted for if your model is to be used for inference - if it is a predictive model only then this is not so important.

Robert Long
  • 60,630
  • 2
    Thank you Robert. I now understand that Model 3 is the most appropriate. – SanMelkote Feb 14 '19 at 14:26
  • @Robert Long: does the answer you wrote apply to this question here? https://stats.stackexchange.com/questions/549563/data-with-hierarchical-structure-and-multicollinearity-e-g-zip-postal-codes thank you ! – stats_noob Oct 31 '21 at 22:48