4

I know the training error of k=1 is 0 because k=1 is basically picking itself as the closest point. However, what would be the training error if k=sample size? Say if there are 20 reds and 30 blues, is the training errors simply 20/30=66.6%?

Math Avengers
  • 581
  • 4
  • 9

1 Answers1

2

In $k$NN if you pick $k$ of size equal to training set size $k=N$, then for classification you pick all the training samples as the $k$ nearest neighbors. In such a case, for classification, you use the majority vote, so the training accuracy would be equal to the proportion of the largest class in the training set. For regression, you would make a prediction by averaging the target variable in the training set, so $\hat y = \bar y$, in such case, mean squared error would be equal to variance $\tfrac{1}{N} \sum_{i=1}^N (y_i - \bar y)^2$.

By the way, for $k=1$ you could still end up with a non-perfect error rate if you have in your data such observations where the same values of features have different labels. In such a case, the algorithm would "randomly" choose between equivalent neighbours.

Tim
  • 138,066