7

I've read in several papers that K-nearest neighbor can be supervised or unsupervised learning. Is Knn always unsupervised when one use it for clustering and supervised when one used it for classification? I've to know if there is a unsupervised Knn in classification as well.

Thanks in advance! Phil

Phil Werner
  • 77
  • 1
  • 1
  • 4
  • 3
    As far as I know K nearest neighbours is a supervised algorithm. What are your sources for it being used as an unsupervised algorithm? Are you sure you're not confusing it with K means? – Denziloe Aug 23 '18 at 21:45
  • cheuk yup ip et al refer to K nearest neighbor algorithm as unsupervised in a titled paper "automated learning of model classification" but most sources classify KNN as supervised ML technique. – AMINU LAWAL Nov 25 '18 at 20:15
  • It's obviously supervised since it takes labeled data as input. – Digio Feb 09 '19 at 13:16
  • I also found the possibility to apply both as supervised and unsupervised learning. For example for anomaly detection with pyod library as unsupervised method, on sklearn as a supervised method. – Andrea Ciufo Jan 24 '23 at 19:02

1 Answers1

6

Assuming K is given, strictly speaking, KNN does not have any learning involved, i.e., there are no parameters we can tune to make the performance better. Or we are not trying to optimize an objective function from the training data set. This is a major differences from most supervised learning algorithms.

It is a rule that can be used in production time that can classify or clustering a instance based on its neighbors. Compute neighbors does not require label but label can be used to make the decision for the classification.

Haitao Du
  • 36,852
  • 25
  • 145
  • 242
  • 1
    Interesting point of view -- I honestly do not know if there a canonically acceptable answer to this.

    But I do think that it fulfills the requirements of being an unsupervised learning program - as you add more data to it, the performance improves, indicating that there is some learning involved. Mathematically the information is saved as the connections between neighbours (and weights, where applicable). Since the number of connections grow superlinearly with number of vertices - there is definitely enough "memory" to effectively learn as new datapoints appear.

    Just IMO.

    – Debanjan Basu May 27 '19 at 15:21
  • Totally agree with you! So many answers regarding this has mostly been "supervised learning" which I do not agree with since there are no "learning" component... But I guess Approximate nearest neighboring could be seen as a clustering or supervised "machine learning" since it learns tree structure by using distances between points. – haneulkim Oct 13 '22 at 16:51
  • On Sklearn is under supervised models, I think also for this reason can be misleading and it can create confusion with what is written in this paper – Andrea Ciufo Jan 24 '23 at 19:09