In regression models such as linear and multiple regression models, there are several conditions that must be met such as normality, non-autocorrelation, heteroscedasticity etc. does ANN also have conditions that must be met as well? If there are, what are the conditions?
-
Welcome to Cross Validated! What happens if you violate those conditions and run OLS, anyway? – Dave May 28 '23 at 03:08
1 Answers
There is a bit of an implicit misconception in the original question that conditions such as error normality, variance equality, and lack of autocorrelation are required to run linear regression.
Not one of those is required to run OLS linear regression. All OLS does is solve the correspondence:
$$ \hat\beta_{OLS}\in\\ \underset{\beta=\left( \beta_0,\beta_1,\dots,\beta_p \right)}{\arg\min}\left\{ \overset{n}{\underset{i=1}{\sum}}\left( y_i-\left( \beta_0+\beta_1x_{i1}+\dots\beta_px_{ip} \right)\right)^2 \right\} $$
(I say that it is a correspondence instead of an equation because there does not have to be a unique solution $(\arg\min)$, such as if two features add up to a third feature.)
This can be solved, either using linear algebra or numerical methods, no matter what numbers you have and what distributions are in play. The idea that you have to meet certain assumptions to solve the above is not correct. You’re always allowed to estimate a regression and evaluate the predictions.
What those linear regression assumptions give you are certain properties, often desirable, related to inferences. For instance, with $iid$ Gaussian errors, the OLS solution coincides with maximum likelihood estimation, and we can get nice equations to calculate confidence intervals and hypothesis tests.
For predictive modeling with a neural network approach, there are some nice pieces of theory (e.g., universal approximation theorems), but the concern typically comes down to one question:
Can this model reliably make useful predictions on data the model has not yet seen?
If the answer it this question is in the affirmative that the model gives reliable predictions, that is confirmation of its utility. Typical methods for doing this validation involve testing on unseen data, rather than relying on theoretical concerns.
In that regard, the theoretical concerns are of lesser importance. If you can show your model to predict well, that is your validation of assumptions (with the assumption, I suppose, being that the model will be able to predict well). Most of the common assumptions for OLS linear regression have to do with confidence intervals and hypothesis testing of coefficient values that you typically would not do in a neural network.
There are a few caveats to this.
If you have dependent categorical observations, such as in a time series, you might be able to use that dependence to make even better predictions, much as autocorrelation can be exploited in linear regressions to get even better estimates. Further, while I wrote that the ultimate test is whether or not your model is able to predict, do not take lightly the possible influence this kind of dependence can have on prediction quality and think that you can ignore it yet get quality predictions. If you have or suspect this kind of dependence, it might be worth starting your modeling under the assumption that you need to consider it as part of your prediction strategy (e.g., including past values as model features, maybe a recurrent architecture).
Equal variance isn’t really a concern in classification problems, as Bernoulli variables have different variances whenever they have different probability parameters. That is, unless you are in a boring situation where the prediction is constant, the variance is guaranteed to change. This idea generalizes to multi-class settings, too.
For more general neural networks than classifiers (e.g., neural network regressions, multi-label problems), knowledge of the conditional distribution can influence your approach to fitting the network parameters (e.g., loss function). If you have heavy tails, for instance, you might not be as inclined to use mean squared error as the loss function because of the influence of extreme points (“outliers”). Since this question specifically asks about classifiers, where the conditional distribution is pretty much set as multinomial, this is not such an important concern, so I will not expand on this here.
Thus, while theoretical concerns that might be important for classical linear modeling are secondary to validated predictive ability when it comes to neural network classifiers, statistical theory does not just vanish.
- 62,186