-1

We are trying to understand the impact of number of workdays on sales.

Please find reprex below:

library(tidyverse)

# Work days for January from 2010 - 2018
data = data.frame(work_days = c(20,21,22,20,20,22,21,21),
           sale = c(1205,2111,2452,2054,2440,1212,1211,2111))

# Apply linear regression
model = lm(sale ~ work_days, data)

summary(model)
Call:
lm(formula = sale ~ work_days, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-677.8 -604.5  218.7  339.0  645.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2643.82    5614.16   0.471    0.654
work_days     -38.05     268.75  -0.142    0.892

Residual standard error: 593.4 on 6 degrees of freedom
Multiple R-squared:  0.00333,   Adjusted R-squared:  -0.1628 
F-statistic: 0.02005 on 1 and 6 DF,  p-value: 0.892

Could you please help me understand if the coefficients Every work day decreases the sale by 38.05 ?


data = data.frame(work_days = c(20,21,22,20,20,22,21,21),
           sale = c(1212,1211,2111,1205,2111,2452,2054,2440))

model = lm(sale ~ work_days, data)

summary(model)
Call:
lm(formula = sale ~ work_days, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-686.8 -301.0   -8.6  261.3  599.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -6220.0     4555.9  -1.365    0.221
work_days      386.6      218.1   1.772    0.127

Residual standard error: 481.5 on 6 degrees of freedom
Multiple R-squared:  0.3437,    Adjusted R-squared:  0.2343 
F-statistic: 3.142 on 1 and 6 DF,  p-value: 0.1267

Does this mean,

Every workday increases the sales by 387 ? How about the negative intercept ?

Similar questions but couldnt apply the learnings:

Interpreting regression coefficients in R

Interpreting coefficients from Logistic Regression from R

Linear combination of regression coefficients in R

  • The F-statistic of both your models suggest that the distribution of sales is not conditional on work-days. In second dataset, the p-value is 0.1267 so only at ~87% confidence level your model has any significance. – Dayne Sep 19 '19 at 09:29
  • 3
    Why did you repost this question? You got your answer two days ago: https://stackoverflow.com/a/57957391/1412059 – Roland Sep 19 '19 at 10:12

1 Answers1

1

Both your interpretations in bold are correct.

The intercept is the fitted value if all predictors have a value of zero. So in your second model, zero workdays would imply sales of -6220. Which illustrates why you can only interpret models over the actually observed range of the predictors - I assume none of your observations come with zero workdays.

Stephan Kolassa
  • 123,354
  • Thanks for response. However, was wondering my interpretation were really making sense because p value is not less than .05 and sample size is just 8 points. Additionally, R-squared implies that it can't explain even 50% of the variation – Abhishek Sep 19 '19 at 08:53
  • 1
    $p>.05$ means that the true coefficient could easily be zero, or even have the opposite sign. (After all, your coefficient is just an estimate of the true value based on your particular sample.) A sample size of 8 is indeed small, and it contributes to a large $p$ value. All of which does not change the interpretation of your model and just means that you should treat your results with caution. $R^2<0.50$ just means that your observations are not explained very well, which is a common occurrence, see this question. – Stephan Kolassa Sep 19 '19 at 09:06
  • If this were a real data set, I would conclude that there is insufficient evidence for an impact of number of workdays on sales. In interpreting coefficients, just remember the coefficients are just part of the equation for a line: predicted sales = -6220 + 386.6*work_days – Jdub Sep 19 '19 at 15:37