I am currently dealing with a dataset with about 300,000 records, there are a wide variety of categories in several columns and naturally when one-hot-encoding these the number of features increases into the 1000s.
Is there some sort of heuristic to find the optimal number of n_components for SVD that in turn minimises the MSE of the Linear Regression algorithm the data is then fed into after the decomposition?
I have tried a random search but 10 hours and counting and it still is nowhere near complete.