2

I ran a aggregated binomial regression, with a non-binary bounded count variable, number of counseling sessions (out of a maximum possible six) attended during a clinical trial as the outcome. So participant 1 may have attended 0/6 sessions, participant 2, 4/6, participant 3 1/6 and so on. This was not a poisson regression which is for count outcomes where there is no known upper limit for the count.

The exponentiated log-odds coefficient (i.e. the odds ratio) for the predictor 'treatment group' was 1.43.

Now if this was a binary/bernoulli outcome I would report this with something like 'the odds of people in the experimental group attending a counseling session were 43% higher than for people in the placebo group'.

But how do I do this for a non-binary bounded count outcome (i.e. a count outcome where the maximum possible count is known)? It's still an odds ratio but I am just not sure how to word it. All the online guides for reporting are for logistic regression with bernoulli-type outcomes.

llewmills
  • 2,151
  • 17
  • 35
  • Please provide more details about how you performed the analysis. Did you do a binomial logistic regression with the fraction of attended sessions as the outcome (and the number of total sessions available to each participant as a weight), or did you do a Poisson regression for the count of attended sessions, or... ? Please provide that information by editing the question, as comments are easy to overlook and can be deleted. – EdM Jul 19 '22 at 03:30
  • Thank you for replying @ EdM. I did not do a Poisson regression. As I wrote I did an aggregated binomial regression. I might be confused about nomenclature here. I thought logistic regression was only for binary [0,1] outcomes but when the outcome is a count variable where the set size is known then it is called aggregated binomial regression. But you may be right that rather than the way I described it, the outcome is actually a fraction of total attended sessions added to each participant as a weight. Perhaps you can help clarify how I should be describing it? – llewmills Jul 19 '22 at 10:57
  • Should instead be calling what I did 'weighted logistic regression'? As explained herre https://stats.stackexchange.com/questions/442796/what-does-weighted-logistic-regression-mean#:~:text=Weighted%20logistic%20regression%20is%20used,Such%20datasets%20are%20generally%20imbalanced. – llewmills Jul 19 '22 at 11:14
  • 2
    Since a binomial distribution with a fixed set size N is equivalent to N iid Bernoulli trials, the regression should be equivalent to a normal logistic regression with each datum replaced by N data with 0/1 values. That serves as a cross check and also shows that odds ratios are still a good way to report the result. – jwimberley Jul 19 '22 at 11:32
  • Are you talking about a NEGATIVE binomial regression model? The negative binomial model is used for count data and is an extension of the Poisson model, but does not assume that the variance is equal to the mean. – Graham Wright Jul 19 '22 at 12:32
  • @Graham Wright no I do not mean negative binomial. I believe the Poisson is a special case of the negative binomial, not the other way around, and both are traditionally used for unbounded counts (where the upper limit of the count is not known). What I am calling 'aggregated binomial regression' is for counts where the maximum number of the count is known. I think EdM was right talking about weights. I know to report it with odds ratios but I still don't know how best to report the odds ratios when the outcome is a count not a single [0,1] trial (i.e. a bernoulli outcome. – llewmills Jul 20 '22 at 16:03

1 Answers1

2

From the question and comments, this is simply a binomial logistic regression, as @jwimberley suggests in a comment.

You are modeling the probability of attending a counseling session, with each individual given 6 chances to attend and with no attempt to model differences in attendance over sessions/time. That's modeled as a binomial distribution, the distribution of the number of successes over a defined number of trials with some constant probability of success. There's no need to restrict the data setup into the form of individual 0/1 Bernoulli trials

The details of how you did the analysis aren't clear, but in R you could have properly used either of the following ways to present such data to glm() for a logistic regression (see the R manual page for family() in the stats package):

As a numerical vector with values between 0 and 1, interpreted as the proportion of successful cases (with the total number of cases given by the weights)

As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

That allows the model to take the total number of observations into account. You can report log-odds or odds ratios however you wish. There's no need to qualify this as a "weighted" regression. It's just that you presented the data to the software in binomial rather than Bernoulli form.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you @EdM. So you are saying that logistic regression does not imply single observations of a bernoulli trial? That's interesting. – llewmills Jul 21 '22 at 07:37
  • However I am still looking for a way to report the results that can be understood by non-statisticians (i.e. almost everyone). For a logistic regression with a bernoulli outcome (e.g. attending attending counselling once or more during the trial vs not attending) it makes sense to say 'the odds of people in the experimental group attending one or more counseling sessions were 43% higher than the placebo'. But when it's a count how do you do it: 'the odds of people in the experimental group attending each extra* counselling session were 43% higher?" Does that make sense? Sounds weird to me. – llewmills Jul 21 '22 at 07:41
  • 1
    @llewmills maybe that's why the name is "binomial" instead of "Bernoulli" regression. For phrasing, with all sessions treated as equivalent, you could say something like "the odds of attending any given session." Alternatively, if all participants had 6 opportunities to attend, you could perform an ordinal logistic regression on the counts for the odds of attending one extra session. I wonder if treating all sessions as equivalent makes sense; you might consider including the session number as a predictor in a mixed model. – EdM Jul 21 '22 at 12:39