1

Consider the two pairs of learning curves below.

enter image description here

  1. The red and green lines are the training and validation curves of some model 1, and
  2. the gray and orange lines are the training and validation curves of some model 2.

Both models were run on the same data with the only exception that model 1 is run on a more complex model than model 2 (see below for full specs).

I initially had model 1 only, but because of the wide gap between training and validation, I concluded that it suffered from overfitting. I then designed model 2 with fewer hidden units to mitigate this overfitting. Although the gap between training and validation did get significantly reduced, the overall loss worsened. I'm therefore confused about what this means. On the one hand, model 2 is "healthier" since it's more generalizable, but on the other hand its predictive power is worsened. I naively thought that reducing overfitting would also reduce the loss, but it seems that there is an lose-lose trade-off between the two.

The setup:

  • 3 million data points, of which 20% are used for cross-validation.
  • 33 features that are either boolean or scalars normalized in the $[0, 1]$ range
  • 1 boolean target
  • The models are multi-layer perceptrons driven by the mean absolute error as its metrics. The hidden units are activated by a ReLU and the output unit is a single sigmoid activation. (I wanted to get prediction scores in the $[0, 1]$ as opposed to a boolean classification so as to have a quantification of confidence.)
  • The number of units of model 1 are [16, 8, 1] and those of model 2 are [8, 4, 1].
Tfovid
  • 785
  • 1
  • 6
  • 14
  • When you say that you have a Boolean target, do you mean that you’re doing a “classification” problem with a categorical outcome? Absolute loss is problematic for “classification” problems. – Dave Aug 23 '22 at 13:53
  • What is your question? It seems model 1 does better than model1. Assuming the test/validation sets are a good sample from the population, I would opt for model 1. The overfitting doesn't really matter since the out of sample loss is smaller in model 1 than model 2. – Demetri Pananos Aug 23 '22 at 14:05
  • @Dave Ideally, I'm doing a regression problem because I'd like to interpret the targets as probabilities. It just so happens that the targets in my training set are either boolean. – Tfovid Aug 23 '22 at 14:05
  • @DemetriPananos My problem is the overfitting in model 1. What can I do to reduce it without increasing the loss. Simply opting for the more simpler (i.e., "rigid") model 2 didn't fix this. – Tfovid Aug 23 '22 at 14:08

0 Answers0