Distributional choices for sparse 0,1 data

Asked Jan 31 '24 at 17:26

Active Jan 31 '24 at 20:39

Viewed 48 times

I am using GAMs to model the relationship between a binary response variable (0 or 1) and several continuous fixed and random explanatory variables. It seems that a binomial distribution is the standard choice, but the data are quite sparse, there are 21,000 0's and only about 800 1's.

I'm curious if there are any other distributions I should consider that might be better equipped to handle these data? I'm intrigued by beta-binomial, but it's my understanding that it's for data bound between 0 and 1.

edited Jan 31 '24 at 20:39

asked Jan 31 '24 at 17:26

Andrew

Welcome to Cross Validated! Each outcome is a flip of a coin (possibly a different coin each time), right? – Dave Jan 31 '24 at 17:33
1

Hello! If your outcome is binary (0 or 1, as you say) I am not sure what you mean by overdispersion here and you would need a different model. Do you mean that you have few 1s compared to 0s, therefore your data are sparse? – jmarkov Jan 31 '24 at 17:52
This question got me to revisit a question of mine from last year, and I believe this to be a duplicate of that. (I sure wish some of those comments would have been posted as answers, though.) – Dave Jan 31 '24 at 18:48
@jmarkov thanks for pointing that error out! Yes, I meant sparse (i.e., few 1s compared to 0s) – Andrew Jan 31 '24 at 20:40

Distributional choices for sparse 0,1 data

0 Answers0