2

I previously posted a question about comparing distributions of categorical outcomes between groups. I haven't found a good answer to the time-dimension part of it, but tried going with simple χ2 for each time period as a starting point, to compare whether the distribution of outcomes (3 levels) is the same between comparison group[s] and a baseline group.

Background / Goal: Basically, there is a relatively well-established underlying process (everyone agrees that it is happening) that generates the distribution of outcomes in the baseline group. There is speculation that different processes may be generating the distribution of outcomes in the comparison groups - but much less agreement on whether those different processes are actually happening, and they're not directly observable in any reliable way. A priori, though, it is plausible that the exact same process generating the outcomes in the baseline group is also at work in the comparison groups, but nobody has closely examined the data. Establishing whether it is the same process or a different one (or, for instance, historically the same but recently diverging) is an important empirical question.

My argument is that if it is the exact same process, then we would expect the distribution of outcomes in the baseline and comparison groups to be very similar. That said, data quality here are generally poor and not everything is observable, and it's not strictly an either-or (same vs. different process). Also, my sample sizes are quite large (~10,000-50,000) (and also, if it matters, somewhat uneven between groups - baseline group is several times larger than comparison groups), so formal tests like χ2 result in miniscule p values that are pretty uninformative (see also these previous answers, also this discussion and this commentary).

After further thought and some helpful discussion with folks here, it seems I should instead focus on some way to quantify how similar or different the outcome distributions in baseline vs. comparison groups are. For instance, it would be meaningful if I could say, these distributions are very similar (or are not at all similar), or have been very similar for some time but are starting to diverge, for instance. Is there a good formal way to quantify / express the similarity of these categorical distributions?

(I'm also working on visually highlighting the similarity, but that's less challenging; also, I'd like a more rigorous way to demonstrate it.)

[Edit: adding example data] For instance, taking one cross-section of my data:

 Outcome  A     B     C
Group
0       5086  27817  1858
1       2160   9996   491
2       1505   8477  1664

Or, as proportions of each group:

 Outcome  A     B     C
Group
0       0.146  0.800  0.053
1       0.171  0.790  0.039
2       0.129  0.728  0.143

For this time point, for instance, it seems (and looks, when graphed) like groups 0 and 1 have very similar distributions, while group 2 is a little more different (but not wildly so) from 0. But can I put a number on these hand-wavy comparisons?

TY Lim
  • 171
  • 1
    The chi-squared test (and other tests) do not "break down:" they work exactly as intended. What breaks down is your analytical strategy: it's usually pointless to conduct a formal hypothesis test with large datasets. Explore deeper, more detailed, meaningful relationships instead. In light of this, your question is far too vague to be answerable. In order to propose "alternatives," we need you to tell us what your analytical objectives are and to describe your data. – whuber Jun 22 '22 at 12:15
  • I've added more detail on the analytical goal that hopefully helps clarify what relationships I'm trying to illuminate. I know the test is working as intended, and did note in the original question that the test isn't wrong so much as hard to interpret meaningfully (i.e. pointless, as you said). – TY Lim Jun 22 '22 at 15:11
  • That helps, thank you. But the chi-squared test will do exactly what you ask: it will help you determine whether any discrepancies you observe are unlikely to be attributable to chance. Your underlying issue, which appears more important, concerns how to identify and quantify "similar." Why not focus on that issue? Also, why do you need a "formal way," when in fact it sounds like you are exploring these data to understand the process? These thoughts suggest looking to methods of visualization and EDA rather than to hypothesis tests in your investigation. – whuber Jun 22 '22 at 17:17
  • Downsampling will not help with anything (except computation time if that is a problem). Artificially making your p.values larger, it will do, and then? You need to tell us some more, what is your baseline, what are the comparison conditions, how many, ... maybe show us an example table (use mockup data if your real data cannot be shown). But: Think about some explorative methods, maybe some variant of correspondence analysis ... – kjetil b halvorsen Jun 22 '22 at 17:43
  • @whuber yes, I'm starting to see from our discussion that defining 'similar' is the key question. Meta-question, should I then 1) edit the question further to focus on that, 2) re-title the question around that, 3) post a new question instead, 4) ??? – TY Lim Jun 22 '22 at 18:42
  • 2
    Because nobody has ventured an answer yet, you should feel free to make any kinds of edits you like to your question. If, as a result, some of these comments become obsolete or confusing, we can delete them. – whuber Jun 22 '22 at 18:44
  • Thanks, I have done so, hopefully it is clearer and more focused now! – TY Lim Jun 22 '22 at 19:06
  • This is still lacking sufficienmt detail to attempt an answer, but: various regression models can be seen as extensions of the chisquare test for contingency tables. Binary logistic regression, multinomial logistic regression, ordered regression for the case of an ordinal row/column. For instance see https://stats.stackexchange.com/questions/410421/analysis-for-ordinal-categorical-outcome – kjetil b halvorsen Jun 22 '22 at 21:00
  • @kjetilbhalvorsen I've added an example from my data, hopefully it makes sense. The linked answer seems to be focused on ordinal categorical outcomes, but my outcomes are not ordinal, just categorical – TY Lim Jun 23 '22 at 02:02
  • So look into multinomial logistic regression? – kjetil b halvorsen Jun 23 '22 at 02:35
  • @kjetilbhalvorsen But in my [limited] understanding, multinomial logistic regression would only work with multiple predictor variables, and here I only have one (the population group)... would it still work? – TY Lim Jun 23 '22 at 14:59
  • It would still work! – kjetil b halvorsen Jun 23 '22 at 16:24

0 Answers0