Background/Context:
Over the last year ServiceY has slowly rolled out across the city. The city is divided into geographic divisions of different sizes which all have different levels of demand for ServiceY. ServiceY currently operates in 15/25 of the city's divisions.
ServiceY is planning to expand into the remaining 9 divisions, and I'm interested in predicting the demand for ServiceY after this expansion.
The observed demand for ServiceY has been collected in aggregate monthly volumes for the past year of operations. This data doesn't specify the source of this demand, but it is known that it must have originated from the divisions ServiceY was operating in at that point in time (ServiceY started with 8 divisions and added more throughout the year, the months at which these divisions were added is known).
It is also known that the demand for ServiceY is closely related to the demand for ServiceX. Historical monthly demand data for serviceX exists for all divisions in the city.
I'm interested in creating a model that relates the demand for ServiceX to the demand for ServiceY. This model could then take the historical average demand for ServiceX in one of the new divisions, and predict the expected demand for ServiceY after it has expanded into that division.
The following is the data I've prepared for this analysis:
| Month | Observed Demand for ServiceX (within all divisions ServiceY is operating at this time) | Total Observed Demand for ServiceY |
|---|---|---|
| 1 | 543 | 275 |
| 2 | 473 | 294 |
| 3 | 747 | 604 |
| 4 | 956 | 639 |
| 5 | 918 | 566 |
| 6 | 877 | 560 |
| 7 | 1109 | 706 |
| 8 | 1083 | 651 |
| 9 | 981 | 559 |
| 10 | 984 | 575 |
| 11 | 1191 | 615 |
| 12 | 1138 | 686 |
The outputs of linear regression on this data is as follows (month # was not included as an independent variable):
Intercept - Coefficient: 62.5, SE: 75.1, P-Value 0.42
ServiceX - Coefficient: 0.544, SE:0.078, P-Value 2.33E-05
Regression Stats:
- Multiple R: 0.903189335
- R Square: 0.815750975
- Adjusted R Square 0.799001063
- Standard Error 62.05308691
The question: When modeling this relationship through Excel's linear regression function, it's unclear if the intercept should be set to 0 for this context.
If serviceY has no operations within any divisions (DserviceX = 0), the demand for serviceX within the divisions serviceY operates would of course be 0. However, I believe forcing the intercept through 0 in this model would result in a slope that's bias, and would reduce the value of this model for use in predicting the new demand. I'd like to better understand the implications of including the constant in these conditions.