0

I am in the process of building a strong understanding of what I believe is the conceptual basis of linear regression.

One thing that I am struggling with is what is the relationship between regression slope and aspects like variation of the variables, p-value and the overall utility of a given model.

As I intuitively understand, the steepness of the slope affects the p-value. That is, if the linear regression is either vertical or horizontal, the p-value will be 1, as this reflects lack of relationship between variables.

enter image description here

But then again, what about situations in which the regression line only slightly deviates either from the vertical or horizontal shape (like the red lines on the graph)? I would assume that now the significance and the overall utility of the model depends more on the residuals. Theoretically, if all residuals equalled zero, we would have a perfect correlation. Does it mean, that unless the regression line is perfectly horizontal or vertical, the slope on its own cannot say anything about the strength and significance of the model?

Thank you in advance for answering my questions and correcting my thinking where it is needed.

Caban
  • 11
  • This question becomes more interesting and meaningful when you plot the regression line in standardized coordinates, for then the slope equals the correlation coefficient. That coupled with the amount of data permits p-values to be computed (subject to assumptions about how the points are actually distributed so we aren't worrying about outliers or high-leverage points). – whuber Mar 11 '22 at 17:53
  • Thank you for your comment. What about the relative distance of points from the regression line? I always thought that this is the crucial factor that affects the correlation coefficient (like illustrated with these graphs: https://pinkocean.pl/wp-content/uploads/2020/10/Pearson_Correlation_Coefficient_and_associated_scatterplots.png ) – Caban Mar 12 '22 at 09:39
  • The Euclidean distance is not appropriate for this model, but the vertical distance (namely, the absolute residual) is. The relevant way to summarize these vertical distances is through their mean square. In standardized coordinates, the mean square equals $1-R^2,$ showing how directly it reflects the closeness of the fit. Such facts become clear and intuitive once you understand the underlying geometry, which I illustrate and explain at https://stats.stackexchange.com/questions/71260. – whuber Mar 12 '22 at 14:04
  • 1
    Thank you, that's a thorough, solid explanation! – Caban Mar 13 '22 at 09:27

0 Answers0