0

I can add that the variance was more than twice the mean. I looked into over-dispersed Poisson models, but traditionally my field leans heavily on normal models. Is there a way to decide on the type of model?

Edit to add more information based on comments:

I had two groups of participants find a target. I measured distance travelled (the program I used has an arbitrary measure of distance). I measured each participant over the course of three trials. I predicted a group by trial interaction (I expected one group to improve more than the other groups).

I anticipated using a multilevel model using LMER in R:

    model <- lmer(distance ~ covariate + trial*group + (1 | participant), 
                        data = df)

Here is the distribution of all scores:

enter image description here and the distribution by group: enter image description here I guess my major concern was that I wasn't sure if it was appropriate to assume a normal distribution. The residual plot showed did not seem 'random' to me, which is what caused my hesitation.

enter image description here

I guess while I am this far in, my secondary question is, Can I include the effects of trial as a random slope without including the interaction as a random effect, such as:

    model <- lmer(distance ~ covariate + trial*group + (1 + trial | participant), 
                        data = df)

Thank you!

Zach H
  • 65
  • 2
    About the only thing one can conclude from the information given is that you have no basis to adopt a Poisson distribution to model this. Ordinarily, one considers using a Poisson distribution for counted data. Distances are not counts; they could have just about any distribution with positive support. What is a "beatbox model"?? – whuber Jun 27 '19 at 18:31
  • Sorry, I edited it. It was supposed to say type of, not beat box! Thank you for your feedback about counts. I suppose I didn’t quite get the idea of counts. I assumed that I was counting distance (as I’m counting steps), given that the results could only be whole numbers. Thank you! – Zach H Jun 27 '19 at 18:39
  • An over dispersed Poisson model is not actually Poisson, and the values are not integers see https://stats.stackexchange.com/a/86052/99274. Your actual question is too broad for any concise answer. Deciding on what model is appropriate without specifying what the data is, requires an answer that is book length. To narrow this down to some pragmatic request, maybe try including some data, or enough data characteristics that any potential answer is not as broad as the Mississippi. – Carl Jun 27 '19 at 18:57
  • Sorry, I am new to posting here. I will try to add some more info to the original post. I appreciate the added info, and I appreciate any guidance to how I should post questions! – Zach H Jun 27 '19 at 19:09

1 Answers1

0

There is no simple answer to your question, usually the choice of model depends a lot of the type of data you have and the context of the study. If your field leans heavily on normal models, there might be a reason (cf. domain expertise).

Having said that, it looks your data takes positive and discrete values, so a discrete distribution defined for positive values might be a good start. As it was mentioned in the comments, a Poisson distribution is usually for count data. If you actually count the number of steps, and the variance is greater than the mean, you could use a negative binomial distribution for example.

Depending on the context, it could also be appropriate to treat discrete values as continuous data and you could use lognormal, gamma or inverse-gamma distributions just to name a few. Also, if there is an upper bound on the number of steps, you might want to consider that as well.

Guillem
  • 375
  • 1
  • 7
  • 2
    Where do you see evidence that the data are discrete? I'd be inclined to believe a priori that distance would be measured as a continuous variable. – klumbard Jun 27 '19 at 19:25
  • Thank you, I added (much) more detail to my post. The data has a lower-bound, but no upper-bound. I'll have to research the other distributions you named, thank you for your help! – Zach H Jun 27 '19 at 19:26
  • If the distance was counted as a number of steps, then it could be viewed as discrete but with the new information available, it seems that a continuous variable could be appropriate as well. – Guillem Jun 27 '19 at 19:33
  • Looking at the graph, I notice two things. First the distribution as a long tail. Second, when looking at the distribution by group, it looks like it's bimodal. If that makes sense regarding the context of the study, you could consider a mixture of normal as well. – Guillem Jun 27 '19 at 19:35