0

I wanted to study the distribution of BMI by age. So I randomly made age groups and did a chi square test. Is this appropriate?

ocram
  • 21,851
user53740
  • 105
  • 2
    Nick gives a good answer with some good comments below. Among the first articles addressing the biases the result from aggregation is Gehlke, C. E. and Biehl, K. (1934). Certain Effects of Grouping Upon the Size of the Correlation Coefficient in Census Tract Material. Journal of the American Statistical Association, 29:169–170, which is an easy read, with easy to reproduce simulations using today's computers. Aggregation should make one think of cross-level fallacies. – Alexis Aug 28 '14 at 15:59

1 Answers1

4

No. Age is a continuous variable and should not be binned (if this can be avoided). Same for BMI. Most $\chi^2$ tests treat variables as nominal, which would waste a lot of information in your case (assuming you have continuous data).

If you want to estimate the association between BMI and age, consider calculating Pearson's r or Kendall's $\tau$. If you want to test whether BMI is distributed similarly (homoskedastic) across the range of age in your data, consider the Breusch–Pagan test.

Nick Stauner
  • 12,342
  • 5
  • 52
  • 110
  • Thank you. Wouldn't chi square be appropriate if it was gender instead of age? – user53740 Aug 28 '14 at 06:54
  • 1
    No. BMI is also continuous. You'd want something like a t-test, Mann–Whitney–Wilcoxon test, or bootstrap test in that case. – Nick Stauner Aug 28 '14 at 06:56
  • I didn't mean BMI as a continuous variable. But, the categories of BMI (under weight, normal and over weight). – user53740 Aug 28 '14 at 07:00
  • 3
    But why throw away good information? Over-weight collapses BMIs of 26 and 40 into the same group, although they're very different... – abaumann Aug 28 '14 at 07:03
  • 2
  • So a scatter plot of BMI and age to study change of BMI by age and a Mann Whitney test for gender will do? – user53740 Aug 28 '14 at 07:11
  • Basically. I'd add a regression line to the scatterplot too (and probably a confidence band for good measure). If you use Pearson's r to estimate the effect size, use ordinary least squares estimation of the regression slope. If you use Kendall's $\tau$, the Theil–Sen slope corresponds to this (I recommend the mblm package if you can use [tag:R]). You may also wish to consider nonlinear associations such as curvilinear relationships, depending on the size of your sample and your tolerance for / the utility of a more complex model. – Nick Stauner Aug 29 '14 at 03:00