I have data like this:
group length
1 5
1 5
1 2
1 3
1 5
1 5
1 3
1 2
1 5
1 3
2 3
2 3
2 3
2 3
2 5
2 2
2 5
2 3
2 3
2 3
I would like to get the probability of length being each of the values length takes on (2, 3, 5) separately for group. I would like to get this with regression. Transformations of the data are fine if required. I am using Stata right now, but any explanation/pseudo-code is greatly appreciated.
To illustrate what I mean, here is how I would do this manually:
*1. Transform the data to get counts by length for each group, calculate total, and calculate probabilities:
group length_2_N length_3_N length_5_N Total prob_2 prob_3 prob_5
1 2 3 5 10 .2 .3 .5
2 1 7 2 10 .1 .7 .2
What I want is to get the .2, .3, .5 and .1, .7, .2 from a regression. It is fine if I need to split the data by group and run two regressions. Any hints?
I think that I basically am wanting to get P(length = x) = $\alpha$, where x = {2,3,5} (for each group). Additionally it would be useful to estimate P(length = x) = $\alpha$ + $\beta$ group.
grouprefers to an interval of time, for example a month. Suppose that I actually have manygroups. In my regression, I would also like to fit a linear time trend so that I can extrapolate for future months. – bill999 Jun 02 '15 at 21:18The last sentence, where I said it would also be useful to estimate
– bill999 Jun 02 '15 at 21:27P(length = x) = $\alpha$ + $\beta$ groupis referring to adding the time trend into the regression.group, especially ifgrouplater will represent a time: such models tend to forecast (or even fit) mathematically invalid probabilities eventually. The standard solution is multinomial regression. Perhaps that's what you're looking for? (I describe this model and interpret its coefficients in the second half of an answer at http://stats.stackexchange.com/a/17203.) – whuber Jun 02 '15 at 21:32