I am working on a clustering project on a dataset that has some numerical variables, and one categorical variable with very high cardinality (~200 values). I was thinking if it is possible to create an embedding for that feature exclusively, after one-hot encoding (ohe) it. I was initially thinking of running an autoencoder on the 200 dummy features that result from the ohe, but then I thought that it may not make sense as they are all uncorrelated (mutually exclusive). What do you think about this?
On the same line, I think that applying PCA is likely wrong. What would you suggest to find a latent representation of that variable?
One other idea was: I may use the 200 dummy ohe columns to train a neural network for some downstream classification task, including an embedding layer, and then use that layer as low-dimensional representation... does it make any sense?
Thank you in advance!