Clustering Algorithm with Continuous and Binary Variables

Question

I am creating a clustering analysis with both Continuous and Binary variables and am wondering

What type of model is best for these cases

How should I scale my data. Should I only scale Continuous variables and keep binary as 0/1? If this is the case, should I then use a MinMaxScaler so that all my continuous variables are also in the 0-1 range?

I have looked at possible options and this is what seems best from my research

A. Using hierarchical clustering model, such as DBSCAN, is best for this type of mixed-data type.
B. Computing a matrix of Gower's Similarity coefficients then feeding into my DBSCAN model will give the best performance.

If I do use DBSCAN, what is the best method for dimension reduction? Should I first standardize, perform dimension reduction, compute gower's similarity coefficient, then input into the model?

Thank you so much for all your help, I am clearly a novice at this.

Related: How to use both binary and continuous variables together in clustering? There are also other threads on the site with relevant information. Consider searching around & reading a variety of posts. — gung - Reinstate Monica, Oct 25 '22 at 16:17
Be aware that DBSCAN is not considered a hierarchical clustering method. — gung - Reinstate Monica, Oct 25 '22 at 16:18

Clustering Algorithm with Continuous and Binary Variables

0 Answers0