After having read matus's beautiful answer in this thread explaining (among other things) Kolmogorov's result regarding the Universal Approximation Theorem with Neural Networks, I wonder: if just $\mathcal{O}$($d^2$) nodes can replicate (!) the target function, provided we can use different transfer functions, do there exist theory or algorithms for inferring from the training data which transfer functions to use?
Asked
Active
Viewed 573 times
4
Alexandre Holden Daly
- 493
- 2
- 11
-
my understanding, the applied theory has gone more in the direction of identifying some standard transfer functions with parameter adjustments that [nearly] support universality. ie mainly the sigmoid function... – vzn Jan 20 '14 at 15:37
-
So it seems, but why? I crave for theory that justifies the path taken by applied research! Deep learning shouldn't be a black box... – Alexandre Holden Daly Jan 21 '14 at 00:25