5

I am preparing to conduct a study and was interested in using a neural network approach. I was wondering if there was a way to roughly work out the sample size that would be needed?

1 Answers1

12

There are two rules of thumb that I know of:

  • There should be approximately 30 times more training cases than the number of weights (Neural Network FAQ)

  • General generalization rule: there should be 10 times more training cases than the VC dimension of the hypothesis set. In NN case the VC dimension is usually assumed to be around the number of weights, so you should have 10 times more training cases than the weights (this rule is presented for example during this course by Dr. Abu-Mostafa. If you need a reference, then you can probably find it in his book).

BartoszKP
  • 501
  • 6
    In general in statistical modeling the sample size is in the range of $20\times (p + q)$ where $p$ is the number of parameters in the final model and $q$ is the number of parameters that may have been examined but discarded along the way. Is the number of weight in NN equivalent to $p$? Are any nodes discarded in the usual algorithms so that you really need to consider $p+q$? – Frank Harrell May 17 '18 at 11:32
  • 2
    @FrankHarrell An answer is probably yes, by very similar reasoning to the one that leads to the simplification that the VC dimension is roughly the same as the number of weights. In complicated NNs it is sometimes possible that some weights end up being unused or irrelevant, but it is impossible to know this before training the model. Additionally, VC dimension is also said to correspond to the number of parameters in the model, or to the number of the degrees of freedom. – BartoszKP May 17 '18 at 11:43
  • 1
    Very helpful. Does the VC dimension account for "phantom nodes" that had the opportunity to be used but weren't? I think it needs to. – Frank Harrell May 17 '18 at 11:56
  • 1
    @FrankHarrell By its definition yes, the phantom nodes shouldn't be included in its value. However in practice it's usually too hard/impossible to take this into account (for the reason stated in my previous comment). – BartoszKP May 17 '18 at 12:00
  • 1
    I'm having trouble reconciling "yes" and "shouldn't" - sorry if I'm misinterpreting. – Frank Harrell May 17 '18 at 12:24
  • 2
    @FrankHarrell Sorry for being unclear. I meant that, IIRC yes the VC dimension value should (by its definition) account for the number phantom nodes in the sense, that they shouldn't be counted. But in practice they are - most of the time it is just assumed that all weights count. – BartoszKP May 17 '18 at 13:41
  • 3
    In a regression modeling context, phantom degrees of freedom must be counted (e.g., variables given an opportunity to be selected but weren't) otherwise biases creep into all quantities. Not clear on why the same thing doesn't happen in neural nets. – Frank Harrell May 18 '18 at 01:50
  • Interesting discussion. I just want to share a paper, in which the 10-times bound is considered insufficient, and they propose a safe margin on 50x : https://www.sciencedirect.com/science/article/pii/S1755534518300058?ref=pdf_download&fr=RR-2&rr=862f73623b8136eb#sec4 – Ricardo Barros Lourenço Mar 12 '24 at 00:03