Questions tagged [k-nearest-neighbour]

A non-parametric method of classification and regression. The input consists of the $k$ closest training examples in the feature space. The output is either the mode of the neighbors (in classification) or their mean (in regression).

The k-nearest neighbors algorithm (k-NN) is a non-parametric method proposed by Thomas Cover used for classification and regression. In both cases, the input consists of the $k$ closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

  • In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its $k$ nearest neighbors ($k$ is a positive integer, typically small). If $k=1$, then the object is simply assigned to the class of that single nearest neighbor.
  • In k-NN regression, the output is the property value for the object. This value is the average of the values of $k$ nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, normalizing the training data can improve its accuracy dramatically.

Source: Wikipedia.

565 questions
29
votes
4 answers

Why do you need to scale data in KNN

Could someone please explain to me why you need to normalize data when using K nearest neighbors. I've tried to look this up, but I still can't seem to understand it. I found the following…
bugsyb
  • 561
  • 1
  • 9
  • 14
7
votes
1 answer

Why is the space complexity of KNN $O(dN)$?

If you Google search the space complextiy of KNN, virtually all answers are saying that it costs $O(dN)$, where $d$ is the dimension of our data and $N$ is the number of our data. But why? I understand that we need to store $d \times N$ numbers, so…
Fraïssé
  • 1,540
6
votes
1 answer

Why is $k = \sqrt{N}$ a good solution of the number of neighbors to consider?

In $k$-NN it is often stated that a good starting number of neighbors to select is $k = \sqrt{N}$ , where $N$ is the total number of points. But why is this so? Examples: Section 10.2.3.2: "When we have a new (test) data point, we want to find out…
Sos
  • 411
6
votes
1 answer

What is the optimal $k$ for the $k$ nearest neighbour classifier on the Iris dataset?

What is the optimal value of $k$, for an unweighted Euclidean kNN classifier applied to the Iris data set? Where optimal implies the value for $k$ which leads to the lowest generalisation error.
5
votes
2 answers

Doing low-dimensional KNN on a large dataset

I have a simple two-dimensional dataset with columns X1,X2, and [outcome], and I want to try KNN (probably K around 100 or 1000, though ideally CV would be possible). The problem is that my dataset has a couple million rows. Are there any preferred…
DavidShor
  • 1,491
4
votes
0 answers

Dynamic approximate nearest neighbor search?

I'm working on developing an algorithm that uses latent spaces, and the ability to quickly find a point's nearest neighbors in latent space would help dramatically. The following is important for any algorithm I use: Fast (sublinear) retrieval of…
4
votes
1 answer

application of 1-nearest-neighbor

In the book The elements of statistical learning by Trevor Hastie, there is a sentence (on page 17) saying : In fact 1-nearest-neighbor, the simplest of all, captures a large percentage of the market for low-dimensional problems. I am just very…
WCMC
  • 1,058
3
votes
1 answer

kNN speed up using triangle inequality

can someone explain how the triangle inequality helps speed up kNN? I understand the general principle of the triangle inequality, however I don't see how a lower bound on $d(x_1, x_2)$ would help with computation. Lower bound from: $$d(x_1,x_2) \ge…
Dahlai
  • 307
2
votes
1 answer

All nearest neighbors in high dimensional space

Suppose I have a very large binary matrix representing $n$ customers and the $m$ products they bought, with $n$ and $m$ both rather large (in the order of millions). The matrix is also pretty sparse. As computing all (or even just the $k$) exact…
cjauvin
  • 613
2
votes
1 answer

what does the k-NN Algo do with equidistant training points from the test point?

This was my professor's interpretation but he didn’t provide an example: there could be training points at the same distance from x such that more than k points are closest to x. In this case, we proceed by ranking the training points based on their…
Milo
  • 21
2
votes
1 answer

Can you derive variable importance from a nearest neighbor algorithm?

While the helpful tooltip warned me this question was subjective, I don't think it is. It should be fairly objective to state from a theoretical perspective whether or not you can establish the importance of any given feature in a KNN type…
1
vote
1 answer

Can KD-tree introduce bias to nearest neighbor search?

When performing k-nearest neighbor analysis on a large dataset, using a kd-tree algorithm can greatly speed up the search. I've tried researching this question, but have not found an answer - can the use of a kd-tree introduce bias into the nearest…
rw2
  • 1,108
1
vote
2 answers

For what type of problems nearest neighbor performs better

I'm trying to predict house prices. I use features like the area of the house, age of the house, etc. I turns out that knn (k-nearest neighbor) algorithm beats all the other powerful algorithms like Neural networks, SVMs, linear regression. What…
1
vote
0 answers

KNN problem with outlier

May I ask why a KNN model with K = 1 will have a strange blue dot on the left hand side of the Bayes decision and KNN decision boundary? I extracted this picture from the ISLR
1
vote
2 answers

Hypothetically, can we perfectly classify new instances given infinite support from validation and training? KNN

Hypothetically speaking, Given infinite amounts of training data and validation data, can we achieve perfect classification (score of 1) given ML algorithms such as KNN? Thank you.
skidjoe
  • 211
1
2