0

I am creating a clustering analysis with both Continuous and Binary variables and am wondering

  1. What type of model is best for these cases
  2. How should I scale my data. Should I only scale Continuous variables and keep binary as 0/1? If this is the case, should I then use a MinMaxScaler so that all my continuous variables are also in the 0-1 range?

I have looked at possible options and this is what seems best from my research

A. Using hierarchical clustering model, such as DBSCAN, is best for this type of mixed-data type.
B. Computing a matrix of Gower's Similarity coefficients then feeding into my DBSCAN model will give the best performance.

If I do use DBSCAN, what is the best method for dimension reduction? Should I first standardize, perform dimension reduction, compute gower's similarity coefficient, then input into the model?

Thank you so much for all your help, I am clearly a novice at this.

0 Answers0