2

I have 40-year follow-up data on children who had a birth risk. I would like to see if a certain risk that was present at birth affects intelligence at 40 years (this risk may or may not cause damage to the individual). Social and economical standing (SES) of the parents has been found to be a covaried variable of adult intelligence in countless studies, i.e. subjects from low childhood SES groups get lower IQ-scores because they are less educated, not because they are less intelligent. The birth risk I'm interested in is unevenly distributed so that 70% of risk cases are in the low SES group and less than 40% of the non-risk (control) cases are in the low SES group. So, it appears to me that childhood (=parents') SES is here both a confounder and a covariant. I have 75 risk cases and 100 controls, and I'm concerned about losing what power I have if I stratify. If I perform a GLM with SES as a covariate, I think I would violate linearity, homogeneity, and independence. Doing the analysis with and without SES seems not an option because I would have to discuss whether a faulty analysis is more informative than an incomplete one. Still, almost every referee would object if I omit SES. Am I at all on the right track?

I would greatly appreciate any suggestions on how to handle this!

Please excuse the long question!

1 Answers1

1

Terminology

I'm not sure what you mean by 'covariant'. A covariate would be a (usually pre-treatment, i.e. causally prior to the 'certain risk', not one of its effects) feature of your cases. A confounder would be something that has a causal effect on the 'certain risk' and also on the final IQ measurement. Parental SES would be a (non-confounding) predictor if it causes IQ but didn't cause the 'certain risk'.

The causal questions

Adding parental SES if it is a confounder would seem to be fairly obligatory, since otherwise the coefficient on the 'certain factor' won't identify the effect of the 'certain factor'. That would be an argument for adding it.

Adding a parental SES if it is a non-confounder i.e. does not cause the 'certain factor' but does cause final IQ score would, on causal grounds, be optional. However, see below for other reasons to include it

The estimation questions

Power is indeed an issue with so few cases. However power depends on the size of the effect of your 'certain risk' which you can't change. By adding parental SES you're going to estimate one more parameter which will cost you some power but if parental SES is very predictive of the outcome then this may more than make up for that; if your model is better overall, this should filter down to your being more confident of the effect of the 'certain risk' too. That would be an argument for adding it.

I'm not sure why you think adding parental SES would violate "linearity, homogeneity, and independence", and to be honest I think it's be a smaller problem, even if they did.

You called it a GLM, but it seems to me that a regular linear regression model should do fine. But perhaps that's just a terminological thing.