So I have a dataset which has 84 rows and 18 predictors. I need to predict Annual_salary based on the data provided.
I did create a model after changing all the categorical variables to factors. Then I used olsrr package to get best subset and got it to be around 7 predictors. Using those predictors I created a model to get rmse around 240,000.
How can I reduce this rmse? I'm certainly new to R programming and need some help. What are the steps I should go about with this data and in what order?
What are some of the other models(algorithms) that I can use to predict? I have used lasso/ridge and Multiple linear regression and I've got lowest rmse using MLR. I might have not implemented any of these correctly so I don't really know if the previous statement is valid.
Link to the data : GDrive
Update: I tried Random Forests with 7 predictors and got rmse around 106k (using entire data for training) on the placement_test data. I'm able to get these scores because its a kaggle competition and its giving rmsescores when I submit the scores.
