6

I am currently conducting a meta-analysis and have pooled the prevalence of a certain disease. I would like to check for any association of risk factors, such as gender, ethnicity, and disease classification (which I have all input as proportions), with said prevalence. May I ask what is the best way forward?

Aaron Mai
  • 83
  • 4

1 Answers1

8

Suppose we have a model such as

$$y = x$$

where $y$ and $x$ are some measurements in a number of samples. Now, if we introduce a third variable, something like a number of subjects in each sample or size of each population, $z$, and we wish to form another model so that we are dealing with proportions, we could have the model

$$\frac{y}{z} = \frac{x}{z}$$

it should now be obvious, that since $z$ appears in the denominator on both side, the two sides are "coupled", hence the term mathematical coupling.

A simple example in R can show this. For simplicity we simulate three variables from a standard normal distribution independently:

> set.seed(1)
> x <- rnorm(100)
> y <- rnorm(100)
> cor(x,y)
[1] -0.0009943199

...so the correlation is close to zero. Or in linear regression:

> summary(lm(y~x))

Call: lm(formula = y ~ x)

Residuals: Min 1Q Median 3Q Max -1.8768 -0.6138 -0.1395 0.5394 2.3462

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.03769 0.09699 -0.389 0.698 x -0.00106 0.10773 -0.010 0.992

Residual standard error: 0.9628 on 98 degrees of freedom Multiple R-squared: 9.887e-07, Adjusted R-squared: -0.0102 F-statistic: 9.689e-05 on 1 and 98 DF, p-value: 0.9922

so the estimates are close to zero and so is R^2.

Now we introduce a third variable:

> z <- rnorm(100)
> cor(x/z, y/z)
[1] 0.9168795

and suddenly the correlation is above 0.9. Or in regression:

> summary(lm(I(y/z) ~ I(x/z)))

Call: lm(formula = I(y/z) ~ I(x/z))

Residuals: Min 1Q Median 3Q Max -45.996 -4.733 -2.784 -1.524 214.929

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.74090 2.53884 1.08 0.283
I(x/z) 1.44965 0.06375 22.74 <2e-16 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.35 on 98 degrees of freedom Multiple R-squared: 0.8407, Adjusted R-squared: 0.839 F-statistic: 517.1 on 1 and 98 DF, p-value: < 2.2e-16

...and the estimate for the slope is above zero with a very small p value, and the R^2 is 0.8407, which is 0.9168795^2

It is worth noting that this example is rather extreme because all the variables are standard normal, and this induces the largest possible effect of mathematical coupling. When the variables are on different scales, with different variances, of different types, or correlated with each other, the effect of mathematical coupling is less pronounced, but nevertheless still present.

So extreme caution is advised when dealing with proportions.

Robert Long
  • 60,630