2

If a binary variable is recoded into 0 and 1, then a mean of it tells us the proportions. A lot of people compare proportions using a t-test of means. For instance, proportion of people enrolled in school between groups, we can use -ttest enrollment, by(group)-. Another example is a regression, where binary terms are thrown in the predictor or outcome variable (in a Linear Probability Model).

Is this problematic? Why is it so common? When is it NECESSARY to do a proportions test of difference instead of a ttest means of difference?

Hutchins
  • 341
  • 1
  • 4
  • 8
  • The t-test isn't quite correct, as you can see from the calculations at https://stats.stackexchange.com/a/159220/919. Intuitively, a test of proportions does not need an independent estimate of a standard deviation and therefore the correct division is by $n$ rather than $n-1.$ However, if I recall correctly, the Student t distribution is a better one to use for computing the p-value. This is usually ignored because applying the t-test is justified only with relatively large datasets, where there's negligible difference between the Student t and standard Normal distribution. – whuber May 23 '20 at 17:10
  • Thanks, that answers that. So how does this work in sample size calculations using power analysis? If it's justified to use a t-test for large n, why can't I calculate marginal proportion change using the two-sample means formula? – Hutchins May 23 '20 at 17:21
  • It depends on who does the power analysis. Many use a Normal approximation because it leads to easy analytical solutions. – whuber May 23 '20 at 17:54
  • Well let me ask this. How would you do it? You are running a linear probability model and you want to power the study for a 10% marginal effect. How would you do it? What program would you use? – Hutchins May 23 '20 at 17:56

1 Answers1

0

Suppose we have 800 school-age subjects in Group A, of whom 683 are enrolled in school. And in Group B suppose corresponding numbers are 1001 out of 1100.

Test of proportions. Then a test of proportions, prob.test This test uses the normal distribution to approximate normal probabilities, but sample sizes are large enough for good approximations. in R, gives the following results:

prop.test(c(683,1001), c(800,1100))

        2-sample test for equality of proportions 
        with continuity correction

data:  c(683, 1001) out of c(800, 1100)
X-squared = 13.991, df = 1, p-value = 0.0001837
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.08708815 -0.02541185
sample estimates:
 prop 1  prop 2 
0.85375 0.91000 

It is not clear whether the continuity correction is appropriate for such large samples. Here is the (very slightly smaller) P-value without the correction.

prop.test(c(683,1001), c(800,1100), cor=F)$p.val
[1] 0.00013692

Using a two-sample t test. If we use a t test on the same data, then we are comparing samples x and y: In x for Group A we have 683 $1$'s and the rest of the observations are $0$'s. In y, we have 1001 $1$'s and the rest are $0$'s. The t test is only approximate because data are not normal, but the sample size is large enough for the P-value to be a good approximation.

The P-value is about 0.0002 or 0.0001 either way, so there is strong evidence that the groups differ.

x = rep(1:0, c(683, 800-683))
y = rep(1:0, c(1001,1100-1001))

table(x)
x
  0   1 
117 683 

table(y)
y
 0    1 
99 1001 

t.test(x,y)

        Welch Two Sample t-test

data:  x and y
t = -3.7026, df = 1495.5, p-value = 0.0002211
alternative hypothesis: 
   true difference in means is not equal to 0
95 percent confidence interval:
 -0.08604969 -0.02645031
sample estimates:
mean of x mean of y 
  0.85375   0.91000 

The P-value is about 0.0002, leading to the same interpretation as for the test of proportions.

I would do the test of proportions, but would not want to disparage use of the t test.

BruceET
  • 56,185
  • Aren't any binaries in a regression evaluated with a t-test? Is there a way to do proportion tests in a regression? – Hutchins May 24 '20 at 15:10