1

I have a data frame "Customer_original" with customer data, it has around 50 mixed type variables.

I standardize "Customer_original" and put the standardized values into "Customer_standardize" data frame.

I then perform Gowers distance on "Customer_standardize" data frame and get "Customer_Gowers" data frame.

I use "Customer_Gowers" to perform hierarchical clustering and decide to use 4 clusters which as result gives me a column of cluster membership.

Now I want to interpret the clusters and wounder which data frame I should use for that matter. I was thinking of using the column of cluster membership on my "Customer_original" data frame, is that a correct way of doing it?

1 Answers1

2

If you've decided on a number of clusters, then you look at cluster membership and see how they vary on the variables of interest.

I'm not sure which data frame that is, or if it is any of the ones you've got.

But .... you first have to decide on a number of clusters. There are various methods, but I like to look at several solutions and see which one makes sense to me or, even better, gives me an "aha" moment.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • I have decided on the number of clusters. I now don't know how to see how variables vary dependeing on the clusters. That is, let's say cluster 1, what variables (customer characteristics) belong to that cluster and what are the differences between say cluster 1 variables and cluster 2 variables. That's what I want to achieve. – ExchangedVisual111 Mar 24 '24 at 21:45
  • You look at the means, sds, etc. (or frequencies, or medians and IQRs) of those variables by cluster. Hoe exactly you do this depends on what software you are using. But software questions are off topic here. – Peter Flom Mar 24 '24 at 21:54
  • Yes, but then I need to insert the cluster membership into a data frame. I have several data frames which all will differ on the interpretation. The easiest would be "Customer_original" but I am unsure whether that's how you do it. Maybe the right way of doing it is inserting into the "Customer_Gowers" since I used that data frame to cluster? – ExchangedVisual111 Mar 24 '24 at 22:30
  • Whichever one has the variables you care about. Probably customer_original. – Peter Flom Mar 25 '24 at 09:58