I am currently working on analyzing a substantial dataset using generalized mixed-effect models. The dataset has a 3-level structure: 400,000 individuals nested within 500 neighborhoods, which are further nested within 30 regions.
I have attempted to perform the analysis using various software platforms including R's lme4 package, SPSS, and STATA. Unfortunately, all of these approaches have been unsuccessful. The computations either take an exceptionally long time or eventually result in errors such as "convergence failed."
Interestingly, when I opt for a fixed effect model instead of a mixed-effect model, the results seem meaningful. This leads me to believe that I shouldn't abandon this line of analysis. Unfortunately, I don't have access to high-end computing resources at the moment.
Has anyone encountered similar challenges? I would greatly appreciate any software recommendations or alternative methods for conducting this type of complex analysis without requiring high computational power.
Thank you for your time and expertise.
Family: nbinom2 ( log )
Formula: phq ~ 1 + (1 | level3/level2)
Data: data_raw
AIC BIC logLik deviance df.resid
1517727.3 1517771.0 -758859.6 1517719.3 406760
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
level2:level3 (Intercept) 0.08246 0.28715
level3 (Intercept) 0.00961 0.09803
Number of obs: 406764, groups: level2:level3, 510; level3, 34
Dispersion parameter for nbinom2 family (): 0.621
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.64588 0.02252 28.68 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I've used glmmTMB package and this is the summary of the null model! @RobertLong
As for the computational challenges, the model didn't converge even after running for an extended period. Specifically, I left the computer running the calculations for over six hours, only to find that the process had failed. It has been quite a frustrating experience.
– Wernicke Sep 11 '23 at 14:00(1 | Region/Neighborhood). No random slope, only random intercept. The 30 fixed effects were chosen via elastic net and forward selection methods. (may be due to the large dataset) The model with only random effects converged, suggesting issues with the fixed effects. Any further feature selection advice would be appreciated. – Wernicke Sep 12 '23 at 01:57summary(mod)of the fitted model (the one with no fixed effects) – Robert Long Sep 12 '23 at 08:39summary(mod)to my answer. Thanks for being so helpful. – Wernicke Sep 12 '23 at 15:57