0

I'm wondering how to calculate the C-Index for determining a 'good' number of groups in a cluster analysis in Stata? It's mentioned in this post (What is an acceptable value of the Calinski & Harabasz (CH) criterion?) for R, but it seems Stata does not provide a built-in solution.

Thank you!

yumba
  • 563
  • What makes it difficult for you to calculate it as soon as you understood it? Or you haven't undersood it yet? – ttnphns Nov 25 '13 at 16:33
  • P.s. My macro for SPSS computes it. But you said you want code for Stata... – ttnphns Nov 25 '13 at 16:35
  • Where can I find your macro? It seems it can be implemented in Stata using the cluster programming subroutines, but I have no experience in programming those kind of things. – yumba Nov 25 '13 at 16:45
  • OK then, if you have SPSS, try it. Visit my web-page and download "Clustering criterions". The documentation is only partly in english, so, if you get questions ask me by email. Please note: 1) C-Index takes time to compute (I don't recommend the macro if you have, say, 500+ objects) but point-biserial r is fast and often give similar results; 2) C-Index is just one of many clustering indices, and you might want to choose another (e.g. Silhouette is quite popular nowadays). – ttnphns Nov 25 '13 at 16:58
  • My instinct is that this would require delving much deeper into Stata's code than is easy or even possible. A more fundamental concern is that the criterion is of dubious relevance unless it was used to define clusters in the first place or can be related directly to cluster generation. – Nick Cox Nov 25 '13 at 19:32
  • Alright, so Calinski/Harabasz pseudo-F and Duda/Hart Je(2)/Je(1) index are the two only stopping criteria available in Stata by default, I guess. I was hoping there are more for agglomerative cluster algorithms. – yumba Nov 25 '13 at 21:27
  • The computation itself is rather simple. If you were an experienced (programming) user it would be simple in Stata (I suppose so). – ttnphns Nov 25 '13 at 22:10
  • The Stata cluster command is specially prepared for user-written stopping rules. Make sure you read this section of the manual, page 3. There's an example that might be useful. It requires some programming but it doesn't look extremely complicated. You may want to give it a try. Also, your question seems off-topic here since you seek only a Stata command to compute some index. – Roberto Ferrer Nov 26 '13 at 03:13
  • Computation is explained here https://stats.stackexchange.com/q/343878/3277 – ttnphns May 08 '18 at 10:41

0 Answers0