1

I have a dataset containing angles. They represent the bending angle that a seedling makes to go toward light. I have two factors: treatment and genotype, so I use a two way ANOVA. However, the heteroscedasticity assumption is not filled. I have really low p values for my Bartlett and Levene tests, and my boxplot looks like this: enter image description here

So I want to transform my data in order to apply my ANOVA but I don't know which transformation I could use. Do you have any idea? It seems like a transformation would not really help. However, here I am almost only interested in the interaction, and it is really significant, so even if the applications conditions are not met, can I consider the results as significant, as they have very very high p-values? Also, i would like to run a post-hoc test and I am wondering if I could use a Dunn's test where i specify that every group is a combination of the two factors. I know it is for one way ANOVA, but is it problematic if I create those new groups which are combinations of the two factors? I mean that would give me the differences of the interactions right?

  • 1
    Are you confusing "homoscedasticity" with "heteroscedasticity"? Homoscedasticity means the conditional variance of the response is constant across the distribution of covariates. – AdamO Oct 27 '23 at 17:01
  • If there is an assumption (read: ideal condition) here, it is of homoscedasticity, not heteroscedasticity. To allow discussion and experiment, two things would help: (1) posting your data (2) explaining the outcome variable in more detail. The axis labels 0 and 100 could be supplemented. If one group varies over about 180 degrees, and another group much less, then the scope for achieving homoscedasticity seems limited. – Nick Cox Oct 27 '23 at 17:02
  • Please look at this very similar question and its answers. I think that your question might be considered a duplicate. If you don't think so, please edit your question to highlight what you think is still unclear. Edit the question rather than adding a comment, as comments are easy to overlook and can be deleted. – EdM Oct 27 '23 at 17:09
  • Hello, thank you very much for your answer! sorry I always confuse those two words! – Marius Audenis Oct 27 '23 at 18:03
  • The roots homo, hetero mean respectively same and different, as in homogeneous, heterogeneous; homosexual, heterosexual. – Nick Cox Oct 28 '23 at 09:07
  • Needs a cross-reference to https://stats.stackexchange.com/questions/629844/non-parametric-anova Please don't run two threads that are so close at the same time. – Nick Cox Oct 28 '23 at 14:17
  • Okay no problem! To do this, do I just mention that there is a cross reference and paste the link of the other thread in my question? – Marius Audenis Oct 28 '23 at 16:00
  • My comments already supply cross-references; the larger point is (ideally) not to run two similar threads at once, and the smaller point is that -- if you do -- then you should provide a cross-reference yourself in future. – Nick Cox Oct 28 '23 at 16:49

1 Answers1

5

Monotonic transformation can improve heteroskedasticity when the spread is a function of location (albeit with some sampling noise, naturally). e.g. if the conditional standard deviation is proportional to the conditional mean, then a log-transformation will tend to give nearly-constant variance. More generally, see the Wikipedia article Variance-stabilizing transformation

Here's an example of the kind of appearance where a transformation helps -- that is, where the spread (sd, IQR, etc) changes like it was some smooth function of the location (mean, median etc):

display consisting of 3 subplots. 1. Boxplot of 3 groups with IQR increasing as the median increases. 2. Plot of IQR vs median, so the functional relationship is more explicit. 3. Boxplot of 3 groups after an approximately variance stabilizing transformation, with much more similar spreads

No transformation of the response variable will do much to "fix" the impression of heteroskedasticity if you have larger and smaller spreads at essentially identical locations --

boxplots of response (angles) for  6 groups (2 treatments x 3 genotypes) subplots
(... note that for each of the two large spreads, there's a smaller spread with a similar location
{plot is from the original post by Marius Audenis} )

With no particularly clear "functional" pattern of spread-vs-mean, there would be no variance-stabilizing transform.

You would be able to find one that would improve things a little (spreading out more at the largest angles and compressing the most negative ones) but I don't think that's likely to be especially helpful overall without straying well outside the usual simple transformation (indeed hunting for one based on the same data you want to use in a hypothesis test will impact the properties of your inference and it won't help the changing spread all that much anyway).

A second (arguably small) issue in that case is that "typical" transformation-families (like Box-Cox) assume positive data. In the boxplot above, lower-median-groups have values either side of zero.

Glen_b
  • 282,281
  • Thank you! But then if a transformation could not really help, what do I do? Can I just analyse my data like that and mention that heteroskedasticity is not fixed? Or is there any other solution like a non parametric test? Also I forgot to mention but my data is not balanced so maybe that also plays a role. – Marius Audenis Oct 28 '23 at 08:11
  • I think that's a question where the issues are considerably more involved than suitable for a brief reply in a comment, and one in which people will likely have a range of thoughts about what might be best in different circumstances. Perhaps it would be a good thing to ask as a new question (including the part about unbalanced data). – Glen_b Oct 28 '23 at 15:51