6

It's clear to me how to interpret the coefficients of a quadratic regression:

data <- data.frame(hours=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60),
                   happiness=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27))

data$hours2 <- data$hours^2

quadraticModel <- lm(happiness ~ hours + hours2, data=data)

summary(quadraticModel)

       Estimate Std. Error t value Pr(&gt;|t|)    

(Intercept) -18.25364 6.18507 -2.951 0.0184 *
hours 6.74436 0.48551 13.891 6.98e-07 *** hours2 -0.10120 0.00746 -13.565 8.38e-07 ***

happinessPredict <- predict(quadraticModel,list(hours=hourValues, hours2=hourValues^2))

plot(data$hours, data$happiness, pch=16) lines(hourValues, happinessPredict, col='blue')

however, what isn't clear is why this works. Both hours and hours2 increase ever more positively. How does squaring hours and add it to the model allow to capture the quadratic trend?

Is there anyone who could provide me with a non-mathematical explanation for this?

locus
  • 1,593
  • 2
    "Both hours and hours2 have a positive relationship with happiness.". Plotting your data will show this is false. Or even just look at the values: happiness peaks at hours = 35 and then declines i.e. it has a unimodal relationship. – mkt Sep 12 '22 at 19:56
  • 2
    It is difficult to conceive of a "non-mathematical" way to explain, with any semblance of accuracy, the purely mathematical concept of a quadratic formula. Geometric explanations can be offered, as well as algebraic ones, but anything non-mathematical would be so non-quantitative as to be of doubtful value. One of the more constructive ways you can approach this situation is to plot some quadratic functions. – whuber Sep 12 '22 at 20:04
  • @mkt, yes, that's right. I actually wanted to write that hours and hours2 both show a positive increase, and that it's confusing to me how hours2 can change the direction of the predictions – locus Sep 12 '22 at 22:25
  • @whuber, yes, maybe I shouldn't have used the term "non-mathematical". I understand it's probably difficult to explain this in a purely conceptual way. I just wanted to avoid answers with loads of formulas that for a non-statistician like myself wouldn't be very helpful understanding the issue I'm having – locus Sep 12 '22 at 22:35
  • One thing I realised from reading this question and @EdM's excellent answer is that I was trying to think of hours and hours2 being two separate predictors, when in fact it's one predictor with a quadratic and linear term. I think lots of learning resources on quadratic regression say to 'just add' the quadratic term to the regression to see if it improves fit etc., as if it was a different variable – user2296603 Sep 13 '22 at 09:51

1 Answers1

10

The individual associations of your hours and hours2 with happiness are extremely weak in your example, and nothing completely "non-mathematical" can explain this. Maybe the following plot can help illustrate how multiple regression allows the predictor hours2 to improve on predictions based solely on hours.

plot of data, models, and components

The values are circles. The dashed black line shows the linear association of happiness with hours alone. Not very good, not even "statistically significant" (p = 0.53 for the hours coefficient).

The solid black line shows the full model. You might think of this as starting with a linear extrapolation of the values near hours = 0, shown in the blue line. You might then think of the (squared) hours2 term as providing a non-linear correction to that extrapolation. Subtract the red curve from the blue line and you get the full model.

Code in R:

plot(happiness~hours,data,bty="n",xlim=c(0,60),ylim=c(0,300))
abline(lm(happiness~hours,data),lty=2)
abline(-18.2536,6.7444,col="blue") # "extrapolation" from 'hours' near 0
curve(.1012*x^2,from=0,to=60,add=TRUE,col="red") # non-linear "correction"
curve(-18.2536+6.7444*x-0.1012*x^2,from=0,to=60,add=TRUE) # full model
legend("topleft",bty="n",
        legend="black dashed, linear 'hours' alone
               \nblack solid, full model
               \nblue, 'hours' component, full model
               \nred, negative of 'hours2' component, full model")
EdM
  • 92,183
  • 10
  • 92
  • 267
  • Many thanks @EdM, that's definitely helpful in visualizing it! I just don't get why the curve for hours2 is .1012*x^2 and not -18.2536 - .1012*x^2, since this would be the regression equation for hours2 when hours=0 wouldn't it? – locus Sep 12 '22 at 22:51
  • 2
    @locus the model is y=-18.2536 +6.7444*x- .1012*x^2. For illustration I combined the first 2 terms (intercept and x term) in the blue line and showed them separately from the second x^2 term, then plotted the (negative of) the second term separately as the red line. If I included the intercept with the second term instead for the red line, I would have had to remove it from the formula for the blue line and that line's close agreement with the values at low x wouldn't have been obvious. For this model the "regression equation" when x=0 doesn't make sense, as then x^2=0 also. – EdM Sep 13 '22 at 12:39