2

I have a dataset with count data (y variable), one primary variable of interest (factor denoted x_1), and several variables I wish to control for (factor and numeric denoted x_i).

Say, y is the number of seats taken in a stadium. And let's say that this stadium has three levels of seating - low price (many), medium price (fewer), and high price (fewest).

A similar paper to the one I'm writing has run 4 separate Poisson regressions on the same set of x variables with y as #all seats, #low priced seats, #medium priced seats, #high priced seats respectively, and is comparing the coefficient of x_1, in each model, to test its effect on the number of low, medium, and high priced seats taken. (The total number of seats equals the sum of low, medium, and high.)

I'm wondering is there two possible problems with this?

  1. The regressions all use the raw count of seats and not the count divided by the number of seats in each section. Is this wrong?
  2. Is it reasonable to use 4 separate regressions in this way, or is there a more legitimate method?

As this is a theory question I think (hope) it's ok not to provide data.

Edit: Just to explain a little more, say the exponentiated coefficient for x_1 is 0.7, 0.8, 0.5, and 0.9 in the #all seats, #low priced seats, #medium priced seats, and #high priced seat regressions respectively. This is interpreted as suggesting that x_1 has the greatest effect on medium-priced seats.

steve
  • 107
  • Please focus on question (2), because (1) is addressed in a very great many posts here on CV: a Poisson response is a count variable, not a proportion; and using the proportion for the response wipes out important information about its variance. – whuber Feb 01 '24 at 22:35
  • 1
    It is almost always better to use one model, see https://stats.stackexchange.com/questions/373890/separate-models-vs-flags-in-the-same-model/373909#373909. Use type of seat as a factor variable – kjetil b halvorsen Feb 02 '24 at 02:33
  • Thanks kjetil. Not sure why the paper I referred to used 4 regressions in this way. My question is really, were the comparisons legitimate? Does the smaller coefficient legitimately mean that x_1 has the greatest effect on, for example, medium priced seats? – steve Feb 02 '24 at 22:26
  • I have a feeling my question is very poorly phrased, I could try to put a dataset together and post if that helped. – steve Feb 02 '24 at 22:27
  • Thanks whuber. I don't quite understand what you mean. Poisson is used to model counts or rates commonly. when rates are modelled an offset is used. Variance would be affected of course but comparing counts of cheap seats (many exposures) with counts of expensive seats (few exposures) directly would be misleading? Also, there is more than one stadium and each one has the same number of seats in each category. – steve Feb 02 '24 at 22:37

0 Answers0