I am creating a clustering analysis with both Continuous and Binary variables and am wondering
- What type of model is best for these cases
- How should I scale my data. Should I only scale Continuous variables and keep binary as 0/1? If this is the case, should I then use a MinMaxScaler so that all my continuous variables are also in the 0-1 range?
I have looked at possible options and this is what seems best from my research
A. Using hierarchical clustering model, such as DBSCAN, is best for this type of mixed-data type.
B. Computing a matrix of Gower's Similarity coefficients then feeding into my DBSCAN model will give the best performance.
If I do use DBSCAN, what is the best method for dimension reduction? Should I first standardize, perform dimension reduction, compute gower's similarity coefficient, then input into the model?
Thank you so much for all your help, I am clearly a novice at this.