3

Going through this lecture note on bias-variance trade-off, I didn't follow the latter part of this paragraph.

It shows the common situation in practice that

(1) for simple models, the bias increases very quickly, while

(2) for complex models, the variance increases very quickly. Since the riskiness is additive in the two, the optimal complexity is somewhere in the middle. Note, however, that these properties do not follow from the bias-variance decomposition, and need not even be true.

The 'It' in the above paragraph refers to the below image:

enter image description here

Questions:

1) If these are properties then why don't they follow from the bias variance decomposition, which states that: $ E[(Y-\hat{f}(x))^2] = \sigma^2 + \text{Bias}^2 + \text{Var}(\hat{f}(x))$.

2) And, under what conditions are they not true?

naive
  • 1,039
  • 1
  • 10
  • 14
  • why should they follow from the decomposition ? It doesn't say anything about complexity. The definition you have in 1) is for a given model complexity. As far 2), if you somehow had the true underlying model. ( not estimated ), then, because there is no concept of complexity in this case ( we know the true model), then the bias won't be a function of complexity and neither would the variance , so the relation wouldn't be true in that case.
  • – mlofton Feb 01 '19 at 15:18
  • 1
    There may be a potential to mis-read the quotation. "These properties" refers to the assertions about rates of increase in bias and variance for "simple" and "complex" models, not to the bias and variance. At best these properties are heuristics--and obviously they are separate from the bias-variance trade-off relationship. – whuber Feb 01 '19 at 20:30
  • @mloftonThe definition in 1 is for any model complexity. Re 2: why is there no complexity if we have the true model. Your second point contradicts the first. You say in first that the decomposition does not say anything about complexity and at the same time in second you implicitly assume that it does. – naive Feb 01 '19 at 20:32
  • @whuber- I must agree to your comment that these properties are heuristics. But they are nevertheless important to understand the bias variance trade-off relationships in a very intuitive way. I am just looking for something that will supplement my understanding. Cheers! – naive Feb 01 '19 at 20:38
  • 2
    While @whuber is correct that these are heuristics, another way to think about things is as a guide to what we mean by model complexity. Any reasonable definition of model complexity will cause these properties to be true. – Matthew Drury Feb 01 '19 at 20:50
  • @Matthew Drury - A reasonable definition of model complexity renders these properties true, are there conditions when they aren't true? – naive Feb 02 '19 at 03:58
  • @naive: the plot is using "complexity" to mean that the various candidate models get more complex as one goes to the right of the x axis. The properties don't explicitly follow from the definition of variance decomposition. I mean that they do follow but how one would put "complexity" in the formula ? For 2), if you have the true model, then there is no concept of complexity because the complexity on the x-axis implies that tthere are many models that one can use and, if we have the true model, then there is only one model. Maybe my 2) is not a good example-answer but 1) is clear to me. – mlofton Feb 02 '19 at 15:00
  • @mlofton - I believe the "complexity" is already incorporated in $\hat{f}(x)$ because complexity can be thought of as a measure of number of parameters to be estimated, and how "free/independent" are they OR the sensitivity of change of model estimates to perturbation of observations. – naive Feb 04 '19 at 09:15
  • @naive: Hi. it's incorporated in the sense you described for sure but it's not part of the definition of the decomposition. There's no way, as far as I know, to make bias and variance a function of the components of complexity. – mlofton Feb 05 '19 at 16:59