8
  • Is it possible to do 2-stage cluster analysis in R?
  • Can anybody provide me resource on it?
Jeromy Anglim
  • 44,984
Beta
  • 6,334
  • You mean like clustering on a sample to quickly get good starting centroids for the 'full' pass? Or more like two different methods altogether? – Marcin Apr 28 '11 at 00:06
  • 1
    Are you referring to the clustering algorithm which SPSS calls two-step? http://www.spss.ch/upload/1122644952_The%20SPSS%20TwoStep%20Cluster%20Component.pdf – Jeromy Anglim Apr 28 '11 at 02:13
  • Yeah! I'm taking about spss kind of 2-step clustering.But I want to do it using R. – Beta Apr 28 '11 at 04:08

3 Answers3

6

The closest package that I can think of is birch, but it is not available on CRAN anymore so you have to get the source and install it yourself (R CMD install birch_1.1-3.tar.gz works fine for me, OS X 10.6 with R version 2.13.0 (2011-04-13)). It implements the original algorithm described in

Zhang, T. and Ramakrishnan, R. and Livny, M. (1997). BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1, 141-182.

which relies on cluster feature tree, as does SPSS TwoStep (I cannot check, though). There's a possibility of using the k-means algorithm to perform clustering on birch object (kmeans.birch()), that is partition the subclusters into k groups such that the sum of squares of all the points in each subcluster to the assigned cluster centers is minimized.

chl
  • 53,725
  • Thanks a lot Chl! It seems to be a very rare package as I didn't come across it after doing huge amount of searches. – Beta May 01 '11 at 06:11
  • @user4278 I also found some hits with Weka (e.g., here) but I didn't look further. – chl May 01 '11 at 07:17
3

Maybe this also can help: https://cran.r-project.org/web/packages/prcr/

Provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) doi:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure.

vasili111
  • 1,059
  • 1
    Thanks for your answer! But it's 8 years ago. I guess it will be useful for someone else. – Beta Oct 31 '19 at 05:27
0

If you are looking for something akin to a 2-step, have you considered looking into Self Organizing Maps. I think it is based on similar (but not the same) principle as 2-step.