4

I am trying to test if there has been a significant change in the weight of two groups, one without exercise and one with exercise.

The two samples have large spread, which creates a standard deviation that is larger than the mean for each group.

My question is, is it still valid to use the unpaired t-test to compare means when the standard deviation is larger than the mean.

If not, what statistical test can I perform?

Glen_b
  • 282,281

3 Answers3

8

If you can, you should show us the plots of the two distributions of people.

However, for weight loss among people who span a large range of weights, it may be preferable to use log scale regardless of whether the distributions are skewed. That is because weight loss is (I think) better thought of on a ratio basis than an additive basis. For a 100 pound person to lose 10 pounds is not the same as for a 300 pound person to lose 10 pounds.

There are also questions of what covariates you should include, whether the people are randomized and so on.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
5

There isn't in my view a single, indisputable analysis that can be recommended on this information: it may even be that comparison of means is not a good objective.

You need to look at your data very carefully. SD > mean implies considerable skewness and possibly even outliers within your data. The outliers are likely to be genuine, i.e. some very heavy people, but they still raise questions for your analysis.

You may be better off working on a log scale, or more generally use a generalised linear model with various link functions to check what is consistent across various ways of examining the data.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
  • 3
    SD > mean doesn't necessarily imply skewness or outliers. If the data is non-negative then that argument could be made but I just think it should be noted that it's perfectly possible to have this occur. But with something like 'change in weight' I might expect low means and high standard deviations... – Dason May 13 '13 at 15:15
  • Correct and worth emphasis. But my wording "within your data" meant what it said. – Nick Cox May 13 '13 at 15:22
3

To answer your question directly, standard deviations larger than the means do not cause any problem in and of themselves. That said, in a normal distribution, this would imply many negative values. Since people's weights can't be negative, this would suggest that your data has other characteristics (skew, perhaps some large outliers or heteroskedasticity, even missing values incorrectly coded as 0) that could be a problem for the proper interpretation of a t-test. Finding out which ones and how to address them requires other information (e.g. boxplots).

Gala
  • 8,501
  • Having plotted the data, I see that there are a some outliers (heavy people) that lead to the increase in STDEV relative to the mean. Removing them would make my data more "normal", however i feel that this defeats the purpose of the test. Would it be appropriate to perform a non-parametric test like the Mann-U Withney test? – user25000 May 13 '13 at 08:53
  • 3
    Don't just remove observations because it makes the data "more normal"! If these outliers are valid data points (i.e. the measurements are okay), you have to include them in your analysis. As Nick Cox said: you might try to transform the data. A non-parametric test like the Wilcoxon test might be usefull as well. – COOLSerdash May 13 '13 at 09:04
  • 1
    @user25000: Note that Mann-Whitney and Wilcoxon are basically the same idea. – Nick Cox May 13 '13 at 10:55