Questions tagged [overfitting]

Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predictive validity.

Models that involve complex polynomial functions or too many independent variables may fit particular samples' covariance structures overly well, such that some existing (and any potential, additional) terms increase model fit by modeling sampling error, not systematic covariance that is likely to replicate or represent theoretically useful relationships. When used to predict other data (e.g., future outcomes, out-of-sample data), overfitting increases prediction error.

The Wikipedia page offers illustrations, lists of potential solutions, and special treatment of the topic as it relates to machine learning. See also:

Leinweber, D. J. (2007). Stupid data miner tricks: Overfitting the S&P 500. The Journal of Investing, 16(1), 15–22. Available online, URL: http://www.finanzaonline.com/forum/attachments/econometria-e-modelli-di-trading-operativo/903701d1213616349-variazione-della-vix-e-rendimento-dello-s-p500-dataminejune_2000.pdf. Accessed January 6, 2014.

Tetko, I. V., Livingstone, D. J., & Luik, A. I. (1995). Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35(5), 826–833. doi:10.1021/ci00027a006.

380 questions
7
votes
3 answers

Overfitting Question

Would you consider that overfitting?
4
votes
2 answers

over-fitting with good enough test accuracy

Let's make things simple. Imagine an underdetermined linear system with $N$ samples and $p$ features $(N
arash
  • 85
  • 4
3
votes
1 answer

Cannot underfit/overfit on the IRIS dataset

I am playing with the IRIS dataset and want to see underfitting and overfitting in action. I am using a multilayer perceptron (2 layers). The problem is that I cannot underfit or overfit the data (see the plot below). I understand why I cannot…
Yuri
  • 139
  • 2
2
votes
1 answer

Is not having overfitting more important than overall score (F1: 80-60-40% or 43-40-40)?

I've been trying to model a dataset using various classifiers. The response is highly imbalanced (binary) and I have both numerical and categorical variables, so I applied SMOTENC and Random oversampling methods on Training set. In addition, I used…
Mehdi
  • 324
  • 1
  • 6
2
votes
0 answers

Overfitting and COLT/Statistical Learning Theory

The aspect of over-fitting is typically viewed from the perspective of both- accuracy and model complexity. To mitigate over-fitting, we usually have the practical approach of having k-fold validations, training-validation-test set. Question:…
sayan
  • 21
  • 2
2
votes
2 answers

Relation between "underfitting" vs "high bias and low variance"

What is the exact relation between "underfitting" vs "high bias and low variance". They seem to be tightly related concepts but still 2 distinct things. What is the exact relation between them? Same for "overfitting" vs "high variance and low bias".…
lordy
  • 294
  • 2
  • 12
1
vote
2 answers

Is it bad to have a large gap between training loss and validation loss?

Say my training loss is 0.5 and my validation loss is 2.5 (both have stopped decreasing, validation loss never increased). I am clearly overfitting. If I add regularization, my training loss becomes 1 and validation loss 3.5. The first model clearly…
Deer Jona
  • 41
  • 1
  • 4
1
vote
0 answers

Overfitting reason in 2-stage model

I'm trying to build an entity matching model. There are 2 kinds of features - binary (0/1) and text features. Initially I made a deep learning model that uses character level embeddings of some of the text features, word level embeddings of the…
user9343456
  • 167
  • 9
0
votes
0 answers

How to reduce the overfitting in my CNN model?

I am new in this world want practice for create a convolutional neuronal network. A model convolutional for image classification. I want classificate women and men images. Previous, I did a course by internet. For now my code is this: import…
cleanet
  • 1
  • 1
0
votes
0 answers

Is a predictor with high i formation value bad? Is there other way to cross check it?

So, I am preparing a dataset for an ML algorithm, but I have run into a problem - the thing is that around 23 of 96 predictors have got IV more than 0.5 (the lowest is 1.7) and I am curious if it is really bad and could it cause overfitting of the…