0

I am doing a classification job using mlr3. There are several covariates besides independent variables (features) in my dataset. I wonder how to select a feature subset and keep all the covariates. For example, here is a simple dataset where V1-V3 are independent variables, PC1 and PC2 are covariates. How to select variables among V1-V3 and keep PC1 and PC2 in the final dataset for training?

> data.frame(V1=1:4,V2=5:8,V3=9:12,PC1=rnorm(4),PC2=rnorm(4),Target=c("A","A","B","B"))
  V1 V2 V3         PC1        PC2 Target
1  1  5  9  0.03192998  0.3418128      A
2  2  6 10 -1.60314372  0.2134503      A
3  3  7 11 -0.90306085 -1.5662568      B
4  4  8 12 -1.42996139 -0.9007882      B
  • Hi,@Dave, Thanks for your quick comment! For your first question, if the best subset obtained from mlr3 filter or selector is, for instance, V1, V2 and PC1, do you mean I can keep PC2 manually? For the second question, I select features to optimize model performance. – YiweiZhu Jun 04 '22 at 15:34
  • Did you read my linked post about the dangers of feature selection? – Dave Jun 04 '22 at 15:36
  • I have to confess I just read your linked post. I will try regularization then. Thank you for your advice! @Dave – YiweiZhu Jun 04 '22 at 16:34