0

I'm fairly new to linear models so I'd like to have an explanation for this phenomenon.

I calculated two linear models with R. The first one outputs this:

lm(formula = Gries_PM10_value ~ Gries_PM10_value_lag1 + Gries_LUTE_value + 
    Kalkleiten_WIGE_value)

Residuals: Min 1Q Median 3Q -23.1273 -4.8379 -0.2452 5.1194 Max 24.6230

Coefficients: Estimate Std. Error (Intercept) 15.83196 2.05175 Gries_PM10_value_lag1 0.47306 0.06918 Gries_LUTE_value -0.24091 0.16299 Kalkleiten_WIGE_value -1.65277 0.53690 t value Pr(>|t|)
(Intercept) 7.716 1.73e-12 *** Gries_PM10_value_lag1 6.838 2.04e-10 *** Gries_LUTE_value -1.478 0.14155
Kalkleiten_WIGE_value -3.078 0.00249 **


Signif. codes:
0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.866 on 146 degrees of freedom Multiple R-squared: 0.2966, Adjusted R-squared: 0.2821 F-statistic: 20.52 on 3 and 146 DF, p-value: 3.773e-11

The parameters describe two places, Gries and Kalkleiten which are 7km apart, then PM10 is a measure for fine particles, LUTE the temperature, WIGE the wind speed and PM10_lag1 the PM10 value from last day.

So this model behaved like I'd expect. The previous-day values of PM10 is very significant, the wind speed is also significant but the air temperature not so much.

But then I added the air temperature for Gries and I got:

lm(formula = Gries_PM10_value ~ Gries_PM10_value_lag1 + Gries_LUTE_value + 
    Kalkleiten_WIGE_value + Kalkleiten_LUTE_value)

Residuals: Min 1Q Median 3Q -19.0517 -4.2531 -0.0896 4.4151 Max 19.3326

Coefficients: Estimate Std. Error (Intercept) 22.85199 1.94546 Gries_PM10_value_lag1 0.28269 0.06309 Gries_LUTE_value -2.68195 0.34075 Kalkleiten_WIGE_value -1.95956 0.45343 Kalkleiten_LUTE_value 2.52227 0.32231 t value Pr(>|t|)
(Intercept) 11.746 < 2e-16 *** Gries_PM10_value_lag1 4.481 1.50e-05 *** Gries_LUTE_value -7.871 7.49e-13 *** Kalkleiten_WIGE_value -4.322 2.86e-05 *** Kalkleiten_LUTE_value 7.826 9.65e-13 ***


Signif. codes:
0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.618 on 145 degrees of freedom Multiple R-squared: 0.5054, Adjusted R-squared: 0.4918 F-statistic: 37.05 on 4 and 145 DF, p-value: < 2.2e-16

Now all the coefficients are strongly significant. Overall, this model seems to fit a lot better even though I wouldn't expect the air temperature to make such a big difference. What's the explanation behind this?

  • I always search for answers to this (and related questions) using the keywords regression significant not. It picks up dozens of interesting threads on the topic of significance changing when variables are added and removed from models. – whuber Feb 17 '23 at 22:44
  • @whuber OK thanks, I'll look into them. I didn't know that this was such a widely discussed topic here. – Quotenbanane Feb 17 '23 at 22:55
  • Just about anything important concerning ordinary least squares multiple regression has been well covered over the last 12 years and 200,000 questions! – whuber Feb 17 '23 at 23:00

0 Answers0