1

I'm trying to find a NHST for my data that, as far as I know, are only compatible with a Kruskal-Wallis test. However, my variables aren't really identifiable as either dependent or independent; either one could be dependent on the other. I understand that a KW requires an IV to be nominal and a DV to be scale (or ordinal), so if it's reasonable to run a KW with variables like these, that's what I'd use to categorise them. If not, are there any alternatives?

For context, I'm trying to determine whether there's any association between range size (KM²) (scale) and food storing strategy (nominal with three categories: Strategy A, B and C) in the species of a particular taxonomic family (e.g. whether species using Strategy A differ significantly in their range sizes from species using Strategy C). I've put an example of my data below.

Species Strategy Range size (KM²)
Species A Strategy C 1020000
Species B Strategy B 1520000
Species C Strategy A 19300000
Species D Strategy C 8400
Species E Strategy C 16900000
koloeus
  • 29
  • Welcome to Cross Validated! You say that your data are appropriate for Kruskal-Wallis. Could you please explain why? – Dave Oct 29 '23 at 23:25
  • @Dave Hi! I believe that it's the most appropriate test as I have two variables (one continuous, the other categorical with >2 categories) where the data are unrelated. They also don't meet parametric assumptions, otherwise I would have gone with a one-way ANOVA. – koloeus Oct 30 '23 at 00:00
  • So is your question if the continuous distributions are different across the categories? – Dave Oct 30 '23 at 00:21
  • @Dave That's right, yes. – koloeus Oct 30 '23 at 00:25
  • Then what is the concern about dependent and independent variables? – Dave Oct 30 '23 at 01:16
  • 1
    The independent (meaning, grouping) variable could be ordinal; it's just that Kruskal-Wallis doesn't take account of the ordering and in that sense as well as others the test doesn't make use of all the information in the data. More importantly, it is hard to know whether your take is too pessimistic as we never see the data or learn anything precise about them. Just possibly, analysis of variance would work well enough despite imperfections, either without or with a suitable transformation. – Nick Cox Oct 30 '23 at 01:45
  • 1
    For example, areas are likely to be positively skewed, but that suggests working with their square roots or logarithms. – Nick Cox Oct 30 '23 at 02:04
  • @Dave I don't know which variable is dependent on the other (i.e. if range size is dependent on strategy or if strategy is dependent on range size) and I'm concerned that a KW isn't reliable when they're not identifiable. I'm asking if it's suitable to use the nominal/categorical variable as the IV solely because it's what I've used to group the (continuous) range size data, or if that's not appropriate. – koloeus Oct 30 '23 at 12:07
  • 2
    When you're genuinely trying to find out something, NHSTs are not the right tool. You should instead be exploring this relationship in your data. And since you characterize this as a mutual relationship, why impose the concepts of independent and dependent variables on your exploration at all? – whuber Oct 30 '23 at 13:41
  • @whuber I'm just not experienced enough to know how else I would approach analysing these data. – koloeus Oct 30 '23 at 14:14
  • 2
    It's going to be difficult for us to advise you generally, but to the extent you can articulate your problem and describe your data we might be able to help. – whuber Oct 30 '23 at 14:27
  • Based on your edit KW seems very unlikely to be what you need. Linear regression, possibly after transforming range size, looks possible. – mdewey Oct 30 '23 at 15:15
  • @whuber I'd really appreciate that. I'm really trying to determine whether there's any association between range size and strategy in the taxonomic family these species belong to (e.g. whether species using Strategy A differ significantly in their range sizes than species using Strategy C). Is this enough clarification? I've added an example of my dataset to the original post. – koloeus Oct 30 '23 at 15:23
  • 2
    Your range sizes are values like 19.3 or 16.9 million sq.km??? Implausible examples don't help. Alternatively: there is some elementary mistake there about units of measurement. – Nick Cox Oct 31 '23 at 12:57
  • 1
    To give a sense of what those numbers mean, South America is listed at $17.8$ million square kilometers, and the European Union is listed at $4.2$ million square kilometers. – Dave Oct 31 '23 at 13:07
  • @NickCox These are real range sizes from a sample of the species I'm researching. Their distributions range from 120 to 130 million sq. km. – koloeus Nov 01 '23 at 20:54
  • Kruskal-Wallis and ANOVA don't care about the causal relationship and whether variables are dependent or independent or which are dependent/independent. The tests relate to establishing whether there is a statistical relationship. This question has been closed for being unclear, but I imagine that it can also be seen as a duplicate... – Sextus Empiricus Nov 03 '23 at 08:20
  • ... I remember a post that criticised the use of the terms dependent/independent (maybe @NickCox wrote it?). While searching I found related posts like It relates a bit to Under which assumptions a regression can be interpreted causally? and Are all statistical models also causal models? – Sextus Empiricus Nov 03 '23 at 08:22
  • This post might also be a little bit an XY-problem. The question in the title is about KW test and dependent/independent variables, but the underlying problem is "trying to find a NHST for my data". Finding a NHST test starts with formulating a hypothesis and potentially including assumptions for the distribution of the data. (Also, you don't find a test for just 'data', but you find a test for your experiment. Data alone, without context, is not meaningful.) – Sextus Empiricus Nov 03 '23 at 08:50

0 Answers0