12

As an assumption of linear regression, the normality of the distribution of the error is sometimes wrongly "extended" or interpreted as the need for normality of the y or x.

Is it possible to construct a scenario/dataset that where the X and Y are non-normal but the error term is and therefore the obtained linear regression estimates are valid?

ECII
  • 2,171

1 Answers1

16

Expanding on Hong Oois comment with an image. Here is an image of a dataset where none of the marginals are normally distributed but the residuals still are, thus the assumptions of linear regression are still valid:

enter image description here

The image was generated by the following R code:

library(psych)
x <- rbinom(100, 1, 0.3)
y <- rnorm(length(x), 5 + x * 5, 1)

scatter.hist(x, y, correl=F, density=F, ellipse=F, xlab="x", ylab="y")