I have a survey dataset with 200 columns (encoded as numbers) and am trying to reduce the number of dimensions. After applying PCA, I can reduce the number of dimensions but each PC barely explains the variance in the dataset. It requires 150 PC's to explain 85% of the variance, which doesn't really do me any favours in reducing the number of dimensions.
I can use less PC's and explain less of the variance, but I'd like to get the best trade-off.
I'm aiming to get to a maximum of 50 PC's, but this only explains 50% of the variance.
Are there any techniques which are commonly used to get around this problem? For example could there be certain columns which are making it harder to explain the variance using PCA and how do I find them? Or are there alternative dimensionality reduction techniques I could use?
Edit: The reason for the dimensionality reduction is to cluster the data, so I would like to have the most variance explained with as little PC's as possible.