0

Hello: I am interested in modelling the purchase rate over time (an integer number of purchases, observed daily, for a number of shoppers) for a collection of 3 different products.

The data look a bit like this:

Person ID Date Product ID Product Name Purchase Count
A Mon X craft beer 6pack 0
A Mon Y 24ct case generic lager 0
A Mon Z 750mL bottle 1
A Tue X craft beer 6pack 0
A Tue Y 24ct case generic lager 2
A Tue Z 750mL bottle 0

Because the outcome of interest is a rate based on a count variable that includes true zeroes, something like Poisson regression appears to be a compelling option. However, I'm struggling to decide how to handle the fact that each product has a difference case count/volume (e.g., a 6-pack of 12oz cans is ~2129 mL of beer so I would expect this item to be purchased less frequently than a 750mL bottle, all else equal).

Some of my colleagues have suggested converting all of the products to a common measurement (e.g., mL) but this results in some non-integer values due to the introduction of a conversion factor. In response, I considered switching to a Gaussian model but this is not ideal because the data are still very granular and bounded below by zeroes (values like {0, 1, 2} become {0, 2129.29, 4258.59}.

One option that seems compelling but I am struggling with is whether I can use the offset term to capture the unit information about each product (e.g., use the mL-per-purchase or its inverse as the offset term).

Has anybody ever used such a technique? If so, how did it work? What exactly did you use as the offset and how did you construct your model?

G. Vece
  • 548
  • That should be fine (maybe a quasi-poisson would be better? but offset is used the same way). For details see https://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353 – kjetil b halvorsen Dec 10 '23 at 22:42

0 Answers0