3

I am doing a cluster analysis based on k-th-nearest-neighbor (KNN) method in SAS. The CLUSTER procedure requires to specify the $k$ (=number of neighbors to use for KNN density estimation) and I was wondering if there is a way or criteria to set the number of neighbors.

Based on the SAS manual,

k th-nearest-neighbor density linkage is strongly set consistent for high-density (density-contour) clusters if $k$ is chosen that $\frac{k}{n} \rightarrow 0$ and $\frac{k}{\ln(n)} \rightarrow \infty$ as $n \rightarrow \infty$.

Is this answer to my question? Any good papers on the application of this method will be helpful as well.

Ken
  • 570
  • 1
  • 5
  • 17

1 Answers1

1

You can -- and, generally, should always -- do some cross-validation and try out a set of different values for k and see what looks best and produces the best results.

ktdrv
  • 450
  • Thanks for your suggestion but cross-validation is what I've been trying and it can be tedious when there are so many different values to try... I want to know if there is a legitimate way to pick the number of $k$. – Ken Apr 23 '12 at 19:12
  • I don't know what to tell you. The optimal k parameter can vary greatly based on your data. Cross-validation or visual examination of the data is the "legitimate" way. – ktdrv Apr 26 '12 at 21:12