I have a set of 33x1 features (x) and they can be related to different two values in (y) and I have 1203985 observations. Using np.shape() you can see the dimensions of x and y. x= (1203985, 33) y=(1203985, 2)
I used a random forest regressor(100 trees) but there I only use one value of y so y being y=(1203985, 1) and it worked well but with some big errors sometimes. However for my application, I need the other value so I inputted y=(1203985, 2) into the random forest regressor and it gave some very unusable results. How come it worked very well for nx1 but not for nx2?. The two values in nx2 are almost not related and have little to no effect on each other and maybe that is the mistake I make? If I shouldn't use random forest here what method should I use (on scikit learn) if I want that kind of learner? How can I analyze what algorithm I should use with respect to my data?
For the preprocessing, I only checked if my features have low variance. I did not scale my data as my y is between [0:1]. This is my first machine learning project so I found this adequate.
I hope my problem is clear.