(Basic question) normal distribution of error in predictive model

Question

When I am developing a predictive statistical model, why do I need to ensure the error is normally distributed? (I have a very small statistical background, so I apologize in advance if this is a very, very basic question).

Could you please cite where you read that this is important? I dispute the claim and want to see the context. — Dave, Sep 09 '20 at 15:39
Closely related threads, which provide many answers, include https://stats.stackexchange.com/questions/16381, https://stats.stackexchange.com/questions/148803, https://stats.stackexchange.com/questions/86835, https://stats.stackexchange.com/questions/395011, etc. — whuber, Sep 09 '20 at 15:46
Predictive model has somewhat different nuance than regression. Why close? — BigBendRegion, Sep 09 '20 at 16:01
I second what @BigBendRegion, but nevertheless will read closely the indicated question and the remaining questions indicated in the comments. Thank you :) — Johanna, Sep 09 '20 at 16:43
The suggested answers really are not adequate since the focus of predictive modeling is different so I will provide answers in replies. — BigBendRegion, Sep 09 '20 at 18:52
The conditional distributions of the target variable do matter a great deal for predictive modeling. In the process of checking for normality, you may have some very obvious indications of non-normality that will indicate alternative models and/or methods are needed.
Examples: — BigBendRegion, Sep 09 '20 at 18:53

score 1 · Accepted Answer · answered Sep 09 '20 at 15:39

Normal errors are much more important to inference (hypothesis testing and confidence intervals) than prediction. See my recent answer here. Depending on the inferential model, normal errors might be ridiculous e.g. logistic regression, where the output is a probability in $[0,1]$ and the truth is either $0$ or $1$ (so the errors are in $[0,1]$).

When you're making predictions, the evidence that your model is good is whether or not it makes accurate predictions on unseen data. This is the legendary "out-of-sample" test or validation data (the two aren't synonyms but are related in that the model being developed does not see those data sets during training...think of not showing students the exam questions while they study questions from your old exams).

(Basic question) normal distribution of error in predictive model

1 Answers1