0

I have multiple cohorts of subjects (texts) which have been modified to improve readability. The data is displayed as fractions (numerator is number of hard to read sentences and denominator total number of sentences). So after the intervention/modification, the fractions changes .

For instance:

1/11 to 3/16 3/14 to 4/15 2/8 to 4/9 0/10 to 4/15

So I am looking to know which is the appropriate statistical test. The number of subjects on each cohort varies between 40 and 180 on different groups

  • So each person has a number of hard sentences, a number of sentences, and a group? And you want to test if the proportion differs across groups? What determines the total numbers sentences? – Jeremy Miles Dec 28 '23 at 11:36
  • Hi. Thanks for your interest. Each text hard a number of sentences (varies according to the text at random) some of which are hard to read. The text modification aims to improve readability (i.e decrease proportion of hard to read sentences). So the result is a different proportion as stated in my example. The texts are grouped in cohorts with varying numbers depending on things like subject and author. I want to compare before/after modification/intervention the texts within a cohort. thanks – Pedro Cunha Dec 28 '23 at 11:53
  • What is the measured outcome variable? What is the hypothesis to test and what sort of statistical variations/sample do you consider? – Sextus Empiricus Dec 28 '23 at 12:26
  • Thanks for your questions. The measured outcome variable will be proportion of hard sentences in the text, either in fractions or as percentage. The null hypothesis is that there is no difference between proportion of hard to read sentences between pre/post intervention cohorts. I don't understand you last question. – Pedro Cunha Dec 28 '23 at 12:45
  • Ah, I thought that these texts where something like two different cases that you controlled and you did some measurements with them like having people read them and measure speed. But it are the sentences themselves that are the observed random variables. – Sextus Empiricus Dec 28 '23 at 13:08
  • A related question about testing with ratios is https://stats.stackexchange.com/questions/398436/a-b-testing-ratio-of-sums – Sextus Empiricus Dec 28 '23 at 14:02

1 Answers1

1

Fractions like those can be considered as binomial random variables. If you have to compare two binomial r.v.s what you do usually is a Pearson's chi-squared test.

Also, in order to have a unique result from all your data, you may want to sum them up among different groups (subject/author), so that you end up with:

  • the total number of sentences before review
  • the total number of sentences after review
  • the total number of understandable sentences before review
  • the total number of understandable sentences after review.

On this data, you can make a chi-squared test.

Caveat

The binomial r.v. assumes that each sentence readability is independent on the adjoining sentences readability. This may not be perfectly the case, and if so, it may inflate the significance of your result. This phenomenon is called underdispersion (and the opposite overdispersion) and it's not super easy to solve, so I will slide on it here.

Other options

  • If you do some research, you will find that Pearson chi-squared test has alternatives, like Fisher exact test or G test. Those also apply to your case, in case you want to consider them. They will almost certainly give you all the same result anyway.
  • If you want to avoid adding up all data from the different groups, there are more complicate alternatives to the simple test I suggested. You can create a mixed effect logit model, with random effect groups and review as a fixed effect. You could also, in theory, sum up the test statistic from many chi-sqaured tests in order to aggregate it, this is good in theory but fairly heterodox, so I don't recommend it.
carlo
  • 4,545
  • Group 1 (before modification) 124/627 Group 2 after 127/714 .
    before after
    hard 124 127
    total 627 714
    – Pedro Cunha Dec 28 '23 at 13:40
  • The caveat seems like a big issue. Using a t-test that estimates the variance based on the observed variance in place of using the binomial distribution that uses a variance based on the observed mean would be better. – Sextus Empiricus Dec 28 '23 at 13:58
  • @PedroCunha considered those number, the proportion of hard-to-read sentences is amazingly insensitive to the modification, and it even decresed a tiny bit. the test will obviously come out as non-significant, I'm sorry (supposong you were rooting for the opposite result). – carlo Dec 28 '23 at 16:24
  • @SextusEmpiricus I don't think so, most people wouldn't mind. Of course one can always dig some more in the data to find out what's going on in terms of autocorrelation, variance between groups, and so on. Hell, one can go over the whole dataset if they see the benefit of it, but in terms of delivered results, a chi-square could just do the trick in most cases here IMO. – carlo Dec 28 '23 at 16:25
  • @carlo it will depend on the situation. Is it good to give the advice to assume independence for a problem that does not have this independence? Just because it is count data doesn't mean that you can use a chi-squared test. Even of you have independence, when you sum up the results from different texts then you get that you have a sum of binomial distributions with different $p$ parameters. – Sextus Empiricus Dec 28 '23 at 17:05