1

How interpret Pr(>|t|) results?

Can I consider the "speed" significant for regression despite the "intercept" having no statistical difference? Or the linear regression model is only reliable if there had a significant "intercept"? Example

summary(lm(dist~speed,data=cars))
# 
# Call:
# lm(formula = dist ~ speed, data = cars)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -29.069  -9.525  -2.272   9.215  43.201 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
# speed         3.9324     0.4155   9.464 1.49e-12 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 15.38 on 48 degrees of freedom
# Multiple R-squared:  0.6511,  Adjusted R-squared:  0.6438 
# F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
EdM
  • 92,183
  • 10
  • 92
  • 267
  • 1
    It would help if you could show the output from a specific model for which you have this question. Edit the question, paste in and select the output, and use the code {} tool to make it readable. There usually are multiple coefficients reported for a model, and typically one worries first whether the model as a whole is significantly different from a model with no predictors. The p values for differences of individual coefficients from 0 can be misleading, particularly when there are interaction terms in the model. – EdM May 19 '23 at 18:31
  • 1
    The p-value for the intercept and for the effect of speed (i.e. the slope) are for different hypotheses. If you are interested in whether there is a significant effect of speed, you typically only look at the p-value for speed to see whether the null-hypothesis for a slope of 0 has been rejected. – Axeman May 19 '23 at 18:37

2 Answers2

1

This page has a useful explanation of what this type of model summary shows.

As @Axeman said in a comment, the coefficients are for different hypotheses. In particular, the "significance" of an Intercept is whether its value is "significantly" different from 0. Simple recoding of the same data in a way that doesn't fundamentally change the underlying model can change that "significance" substantially.

For example, people sometimes "center" variables in regression to have mean values of 0, by subtracting the mean value of each variable from the individual values. That provides the same association between outcome and predictor (what's usually of primary interest), but makes the Intercept here completely "insignificant":

summary(lm(I(dist-mean(dist))~I(speed-mean(speed)),data=cars))
# 
# Call:
# lm(formula = I(dist - mean(dist)) ~ I(speed - mean(speed)), data = cars)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -29.069  -9.525  -2.272   9.215  43.201 
# 
# Coefficients:
#                         Estimate Std. Error t value Pr(>|t|)    
# (Intercept)            1.397e-14  2.175e+00   0.000        1    
# I(speed - mean(speed)) 3.932e+00  4.155e-01   9.464 1.49e-12 
# ---
# 
# Residual standard error: 15.38 on 48 degrees of freedom
# Multiple R-squared:  0.6511,  Adjusted R-squared:  0.6438 
# F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

This gets even more confusing when there are interaction terms in a model, as then even the apparent "significance" of an individual predictor in a display like this can change when you center another predictor with which it interacts. Be very careful when interpreting individual coefficient p values from this type of model summary.

This particular example happens to show an important reason for not immediately jumping to centering. Think about the data: they are for stopping distances (in feet) for cars that were originally going at different speeds (in miles per hour). The Intercept is not only "significantly" different from 0: it's negative. That means that, if you were going 0 miles per hour, you would go backward more than 17 feet when you applied the brake! There clearly is a problem with the way that the model represents reality. You might not have seen that if you just centered all the variables to start.

EdM
  • 92,183
  • 10
  • 92
  • 267
0

Plot it in order to gain more insight into the meanings of those statistical tests.

The command plot(cars) will creat a plot with points that are more or less along a hypothetical line with a clear slope but not an y-intercept much different from zero.

plot(cars)

  • A test of the intercept is not directly a test for whether the line significantly fits the model. Often the intercept is an arbitrary point on the fitted curve. It is the level where the curve intersects the line x=0, but we could just as well use the intercept with another line x=a. Some of the graphs here show this https://stats.stackexchange.com/a/506196/ A test for whether the intercept parameter is different from zero makes sense when it has some meaning for the model. (e.g. in chrmometrics it could be a baseline for some measurement) – Sextus Empiricus May 19 '23 at 20:43