2

Hello I have 100 students who answered the question, time spent sitting and I divided it into three groups: Group 1: (6-9) hours Group 2: (10-13) hours Group 3: (14-17) hours

Also the students answered the Boston questionnaire for symptom severity scale Which is 11 questions, and each question has scores from 1 to 5.

Below shows Boston score divided into groups of time spent sitting.

(6-9 hours) group 1 11 14 12 23 16 and there is 78 more numbers in this group.

(10-13 hours) group 2 There is 15 numbers.

(14-17 hours) group 3 There is two numbers 17 17

So I want to compare the mean of these three groups, because I have unequal sample size which is very different between my groups I used welch ANOVA test. But balecause in group 3 the variance is 0 the welch test cannot be performed, how should I solve this problem?

I hope I explained it clearly, sorry for my English. Thank you

Link to picture https://drive.google.com/file/d/1wDjgXA1SnqGHhJq7PZrWpuzTWBBtDXIH/view?usp=drivesdk

kareen kk
  • 71
  • 1
  • 6
  • 3
    Please say more about the nature of the values whose means you are trying to compare. Please do that by editing the question, as comments are easy to overlook and can be deleted. Depending on the nature of your data, it might be possible to do either a standard linear model related to ANOVA or a non-parametric test. With only 2 cases in one group, however, you are unlikely to find it significantly different from other groups in any event. Consider why you think it's important to evaluate that group at all. – EdM Mar 14 '22 at 17:57
  • 5
    Obviously you don't have enough observations to estimate the variance in group 3. One option to get around this is to assume all variances equal (as standard one-way ANOVA does). Is there any reason why you can't do that? In case the variances in group 1 and 2 don't look strongly different, it's probably sensible. – Christian Hennig Mar 14 '22 at 18:05
  • Because I don't have equal sample size that's why I think I can't use the regular anova – kareen kk Mar 14 '22 at 21:02

1 Answers1

3

This is probably the main source of your problem:

The students answered in hours the time that they spent sitting and I divided it into the following three groups...

Breaking a continuous predictor into groups is not a good idea. If you keep the hours sitting as a continuous predictor you could use standard regression techniques to examine the relationship between your symptom severity score and hours sitting. You are otherwise making assumptions, for example, that there's a big difference between sitting for 9 hours versus 10 hours but no difference between 10 and 11 hours.

Added in response to comments:

If you have additional predictors to evaluate, perform multiple regression rather than separate regressions. Otherwise you run a risk of omitted variable bias that could affect all of your estimates of associations with outcome.

If for some reason you are forced to use some form of ANOVA for these data, despite its far-from-ideal applicability, you have a few choices:

You could just do a standard ANOVA, without the Welch modification, assuming equal underlying variances around group means. That's what Christian Hennig suggested in a comment on the question. You could then use standard techniques like normal qq plots to check that assumption with the residuals.

You could use the non-parametric Kruskal-Wallis test, which addresses similar analysis as ANOVA but makes no assumptions about error distributions. With only 2 cases in one of your groups, however, I don't suspect that will be helpful.

Instead of discarding the 2 cases in group 3, you could add them into group 2. The you could do a t-test between the groups (6-9 hours; 10+ hours). If you are concerned about equal variance between the 2 groups you can used the Welch modification.

I'm very reluctant to recommend discarding data unless you have strong evidence that there was something wrong in the way it was collected.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Sorry what do you mean by continuous predictor? – kareen kk Mar 14 '22 at 18:59
  • 2
    @kareenkk just use the reported values on the scale of 6 hours, 7 hours,...16 hours, 17 hours of sitting as the predictors in a regression model rather than breaking them up into groups to use ANOVA. – EdM Mar 14 '22 at 19:02
  • I'm a research students and I'm following the idea of a specific research, for example the divided the age of students into three groups and they compared the Boston scores for each of the three groups, I will add the picture of what I mean. They used ANOVA for that. – kareen kk Mar 14 '22 at 19:06
  • 2
    @kareenkk it's much better not to break observations down into groups, even if some other researchers did that. If you nevertheless are forced to break down into groups and the cutoffs between groups can't be changed to make groups of more comparable sizes, you could still try to model with ANOVA. You assume a constant within-group variance to start, do the ANOVA, and then evaluate that assumption by examining the distribution of residuals about the group means, for example with plots called "normal q-q plots." – EdM Mar 14 '22 at 19:35
  • We already submitted the proposal that's why I can't change the methods. Thank you sir for your time – kareen kk Mar 14 '22 at 19:41
  • 4
    What kind of proposal prevents you from doing it better? – Michael Lew Mar 14 '22 at 19:46
  • I mean sir i wrote the research proposal and I submitted it so the research project has to follow the research proposal, that's what I understood. If I'm not wrong – kareen kk Mar 14 '22 at 19:52
  • 3
    @kareenkk it's good practice to use improved methods over what you have originally proposed, if that allows you to answer the underlying scientific question more precisely. You might want to check with those who approved and accepted your research proposal to be sure, but it's hard to imagine that there would be any objection to your using an improved statistical analysis method that you have now learned about. – EdM Mar 14 '22 at 20:00
  • Thank you for answering – kareen kk Mar 14 '22 at 20:09
  • @EdM sorry sir for asking too many questions, the three groups of time spent sitting i set them in spss as nominal variables not scale variables. I did scale variable for the scores of symptom severity only. Is this correct? – kareen kk Mar 14 '22 at 21:32
  • @EdM I gave a value of 1 to (6-9) group Value of 2 to (10-13) and value of 3 for 14-17 group. – kareen kk Mar 14 '22 at 21:42
  • 1
    @kareenkk my suggestion is to use the actual hours spent sitting as a scale-variable predictor in a regression model. For each individual, use the sitting hours as a predictor and the symptom score as the outcome. If the data on actual hours spent sitting aren't still available for the individuals, then your assignment of 1/2/3 values to the groups as in your last comment will respect the ordering from less time to more time in a way that a standard ANOVA wouldn't. Hard to say if that will be better; there might be a peak in symptoms at intermediate sitting times. – EdM Mar 14 '22 at 22:03
  • @EdM If I'm forced with performing anova Can I use the welch ANOVA and just ignore group 3? Just compare between group 1 and 2? because the welch ANOVA in group 3 can't be performed because variance is 0, because I have two observations 17, 17. And if I had 16, 17 in group 3, I mean only two observations in one group, would this be a problem to perform welch ANOVA? What's the limited number of observations in a group that you have to have in order to perform welch anova? And I only need p value and f value do I have to perform normal qq plots test? Thank you – kareen kk Mar 15 '22 at 08:39
  • 1
    @kareenkk I added to the answer to address those issues. You seem to have 2 identical comments here; please delete 1 of them. – EdM Mar 15 '22 at 10:08
  • @Edm sorry i added the comment twice accidentally, thank you very much for your time and responses its now very clear for me. – kareen kk Mar 15 '22 at 10:41
  • @Edm I'm allowed to change my research methods, can I perform linear regression analysis for the two groups sitting and symptoms score as in this link?https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php – kareen kk Mar 15 '22 at 16:17
  • 1
    @kareenkk if you are only analyzing the 2 groups the regression won't be different from a t-test. If you have the actual hours sitting for each individual, then the regression would be a better approach. – EdM Mar 15 '22 at 16:25
  • @EdM yes I have the time in hours for each subject so I won't divide them into groups, but I have another catagory typing per hour and the same symptom score, in this case can I do regression separately for sitting and typing instead of doing multiple regression? – kareen kk Mar 15 '22 at 16:42
  • 1
    @kareenkk do multiple regression. – EdM Mar 15 '22 at 17:00
  • @EdM thank you very much – kareen kk Mar 15 '22 at 17:01
  • @Edm sir if you have time I would like to hear your opinion about this question that I asked in this link https://stats.stackexchange.com/questions/568726/linear-regression-assumptions-violated?noredirect=1#comment1049093_568726 thank you – kareen kk Mar 22 '22 at 22:36