When I am developing a predictive statistical model, why do I need to ensure the error is normally distributed? (I have a very small statistical background, so I apologize in advance if this is a very, very basic question).
1 Answers
Normal errors are much more important to inference (hypothesis testing and confidence intervals) than prediction. See my recent answer here. Depending on the inferential model, normal errors might be ridiculous e.g. logistic regression, where the output is a probability in $[0,1]$ and the truth is either $0$ or $1$ (so the errors are in $[0,1]$).
When you're making predictions, the evidence that your model is good is whether or not it makes accurate predictions on unseen data. This is the legendary "out-of-sample" test or validation data (the two aren't synonyms but are related in that the model being developed does not see those data sets during training...think of not showing students the exam questions while they study questions from your old exams).
- 62,186
Examples:
– BigBendRegion Sep 09 '20 at 18:53