0

I have a dataset structured something like this - obviously the numbers are exaggerated and I actually have hundreds of observations in each group.

Group 1:

Observation Number of specific items Total items Proportion of specific to total
Obs1 10 100 .10
Obs2 20 190 .11
Obs3 15 160 .09

Group 2:

Observation Number of specific items Total items Proportion of specific to total
Obs4 150 500 .30
Obs5 13 50 .26
Obs6 75 250 .30

In this case, each observation has a different total number of items and that number can vary a lot between observations.

I want to compare the Proportion of Specific Items (last column) between the two groups. The hypothesis is that the Proportion of Specific Items is different between the two groups. Essentially, I believe it is a comparison of means of the Proportion of Specific Items columns, but I'm concerned that a t-test is not the right test in this case due to that column being a proportion or a ratio of Specific Items to Total Items.

A t-test doesn't seem correct since I'm comparing proportional values but a Z-test doesn't make sense because I have multiple observations and cannot make a simple 2x2 contingency table.

Some additional clarifications and details:

Each of the count variables is expected to be distributed as a normal random variable. However, the two are surely correlated since increased total numbers of items would imply increased specific items as well, at least for this particular problem.

Doing a little more research I found the Ratio Distribution. This seems to fit quite nicely and there is a reference to a solution for the distribution of correlated random normal variables. (Hinkley 1969)

Further, similar questions have been asked here:

Test for significant difference in ratios of normally distributed random variables

A/B testing ratio of sums

Based on these results, I feel like my instinct that the t-test is not appropriate was correct. There does not appear to be a specific test for this, however. Some suggestions included the Delta Method (I need to study this one closer to feel comfortable with the math), and a permutation test to create an empirical null distribution.

The permutation test made me think of non-parametric approaches. Something like a Wilcoxon Rank-Sum test seems like it might be appropriate here, and given the number of observations (500+ in each group), I'm not too concerned about loss of power with such a test.

I appreciate any feedback on using a non-parametric test in this case.

EDIT: Added some clarifying text from my comment. Thanks to @whuber for asking for the clarification which very much helped me to better frame my question.

EDIT: Add some additional clarification regarding the distributions of variables, correlation, etc.

KirkD_CO
  • 1,138
  • 1
    Please tell us what comparisons you wish to make. There is no natural pairing between the groups to suggest what those might be and the answer depends on exactly which comparisons you have in mind. – whuber Aug 17 '22 at 23:02
  • I want to compare the Proportion of Specific Items (last column) between the two groups. The hypothesis is that the Proportion of Specific Items is different between the two groups. Essentially, I believe it is a comparison of means of the Proportion of Specific Items columns, but I'm concerned that a t-test is not the right test in this case due to that column being a proportion or a ratio of Specific Items to Total Items. – KirkD_CO Aug 17 '22 at 23:06

0 Answers0