In multi-class classification using nearest neighbour, I believe that as the dimension of the space increases, we need exponentially more samples to keep the classification error under a certain threshold. Is there any theoretical support for this intuition? I am considering scenarios where the boundary is deterministic (not probabilistic) and the data points are uniformly distributed within a defined domain of real space.
Asked
Active
Viewed 16 times
0
-
1You can turn this into a geometry problem if you specify the space (e.g. uniformly distributed in a unit $n$-ball or hypercube or iid $N(0, I_n)$ or whatever) and look for the sample size to have a given probability $p$ of having at least one neighbour within a given distance $d$. Related: https://stats.stackexchange.com/questions/130998/explanation-of-formula-for-median-closest-point-to-origin-of-n-samples-from-unit and https://stats.stackexchange.com/questions/122334/how-to-solve-this-problem-on-curse-of-dimensionality-problem-nearest-neighbour – Henry Mar 15 '24 at 11:02