Normality assumption in linear regression

Question

As an assumption of linear regression, the normality of the distribution of the error is sometimes wrongly "extended" or interpreted as the need for normality of the y or x.

Is it possible to construct a scenario/dataset that where the X and Y are non-normal but the error term is and therefore the obtained linear regression estimates are valid?

Trivial example: X has a Bernoulli distribution (ie, taking the values 0 or 1); Y = X + N(0, 0.1). Neither X nor Y is normally distributed on its own, but regressing Y on X still works. — Hong Ooi, Feb 17 '14 at 08:45
I guess you are thinking about the distribution of the residuals, not the distribution of the variables. — tashuhka, Feb 17 '14 at 10:03
I have an example worked out here: What if residuals are normally distributed but Y is not? — gung - Reinstate Monica, Feb 17 '14 at 14:52
Related: https://stats.stackexchange.com/questions/148803/how-does-linear-regression-use-the-normal-distribution — kjetil b halvorsen, Dec 07 '19 at 13:25

score 16 · Answer 1 · answered Feb 17 '14 at 13:16

Expanding on Hong Oois comment with an image. Here is an image of a dataset where none of the marginals are normally distributed but the residuals still are, thus the assumptions of linear regression are still valid:

enter image description here

The image was generated by the following R code:

library(psych)
x <- rbinom(100, 1, 0.3)
y <- rnorm(length(x), 5 + x * 5, 1)

scatter.hist(x, y, correl=F, density=F, ellipse=F, xlab="x", ylab="y")

Normality assumption in linear regression

1 Answers1

Linked