0

Background/Context:

  • Over the last year ServiceY has slowly rolled out across the city. The city is divided into geographic divisions of different sizes which all have different levels of demand for ServiceY. ServiceY currently operates in 15/25 of the city's divisions.

  • ServiceY is planning to expand into the remaining 9 divisions, and I'm interested in predicting the demand for ServiceY after this expansion.

  • The observed demand for ServiceY has been collected in aggregate monthly volumes for the past year of operations. This data doesn't specify the source of this demand, but it is known that it must have originated from the divisions ServiceY was operating in at that point in time (ServiceY started with 8 divisions and added more throughout the year, the months at which these divisions were added is known).

  • It is also known that the demand for ServiceY is closely related to the demand for ServiceX. Historical monthly demand data for serviceX exists for all divisions in the city.

  • I'm interested in creating a model that relates the demand for ServiceX to the demand for ServiceY. This model could then take the historical average demand for ServiceX in one of the new divisions, and predict the expected demand for ServiceY after it has expanded into that division.

The following is the data I've prepared for this analysis:

Month Observed Demand for ServiceX (within all divisions ServiceY is operating at this time) Total Observed Demand for ServiceY
1 543 275
2 473 294
3 747 604
4 956 639
5 918 566
6 877 560
7 1109 706
8 1083 651
9 981 559
10 984 575
11 1191 615
12 1138 686

The outputs of linear regression on this data is as follows (month # was not included as an independent variable):

Intercept - Coefficient: 62.5, SE: 75.1, P-Value 0.42

ServiceX - Coefficient: 0.544, SE:0.078, P-Value 2.33E-05

Regression Stats:

  • Multiple R: 0.903189335
  • R Square: 0.815750975
  • Adjusted R Square 0.799001063
  • Standard Error 62.05308691

The question: When modeling this relationship through Excel's linear regression function, it's unclear if the intercept should be set to 0 for this context.

If serviceY has no operations within any divisions (DserviceX = 0), the demand for serviceX within the divisions serviceY operates would of course be 0. However, I believe forcing the intercept through 0 in this model would result in a slope that's bias, and would reduce the value of this model for use in predicting the new demand. I'd like to better understand the implications of including the constant in these conditions.

  • I would like to suggest that much will become clear upon examining a plot of these data, with the x-axis representing months 0 - 12 and the y-axis representing the observations, both on accurate linear scales: you will immediately see whether the point (0,0) is consistent with the others in the sense of falling near a common line. This examination should also expand your consideration to whether it is appropriate to include the first 1 - 3 months of data or even whether a line is a suitable model for the time trend. – whuber Jul 07 '23 at 15:19
  • @whuber Thank you for your response. I'm not sure the relationship between time and the observations is important for this analysis. The small growth over time seems negatable compared to the growth from geographic expansion of serviceY. I may be misunderstanding. – SoccerDadExcel Jul 07 '23 at 16:02
  • The relationship is the crucial thing, is it not? If not, then why are you conducting this regression in the first place?? – whuber Jul 07 '23 at 16:05
  • @whuber the relationship between the demand for ServiceX and ServiceY is crucial, as the demand for ServiceX in the other divisions is used to estimate the future demand for ServiceY. – SoccerDadExcel Jul 07 '23 at 16:12
  • 1
    I'm sorry--I did not see the rightmost column because it scrolls out of the window. A more succinct version of your table would help avoid this problem. Regardless, the advice stands (although the answer might change): plot the response against the explanatory variables to evaluate the plausibility of a no-intercept model. – whuber Jul 07 '23 at 18:04

0 Answers0