5

I'm trying to compare the statistical evaluation of neural network and linear regression. From some articles I found, the DOF increases with model complexity for neural network

"Degrees of freedom, defined as parameter counts, have been frequently used in model selection."-- https://auai.org/uai2016/proceedings/papers/257.pdf

enter image description here https://deeplearning.web.unc.edu/files/2016/10/Degrees-of-Freedom-in-Deep-Neural-Networks-PPT.pdf

"It is much smaller than the total number of parameters..." -- https://link.springer.com/chapter/10.1007/978-3-540-70981-7_26

while it decreases with model complexity in linear regression (number of data - number of terms), which is confirmed by many sources. "In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself "--https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)

So given the same number of training data, larger DOF means less complexity for linear regression and more complexity for neural network. Why do they have different meanings of the same terminology? This makes it harder to compare the two modeling methods.

reko34
  • 59

1 Answers1

1

The confusion probably arrises because in a linear model there are multiple definitions of degrees of freedom. Depending on which one you use you may get the discrepancy with a definition of degrees of freedom in another model.

  • When a linear model is fitted then the space of all possible observations is split into two parts of dimensionality $p$ and $n-p$ where $p$ is the number of parameters and $n$ is the number of observations. See for instance the question Why are the residuals in $\mathbb{R}^{n-p}$?

  • The linear model fit results in two vectors, each one in one of those two spaces: 1. the modelled estimate 2. the residual

  • If we compute an F-statistic then we compare (using a ratio) the sum of squares of both these vectors divided by the dimensionality of the space. That dimensionality is also the 'degree of freedom'. It is a measure for how many parameters are 'free to vary'. But, depending how you look at it are different things that can be described as the degrees of freedom $n-p$ (in how much ways can the residual vector differ), $p$ (in how many ways can the model differ), or $p-1$ (in how many ways can the model differ relative to an average of the data).