How do you get lmFuncs functions of the rfe function in caret to do a logistic regression?

Question

I've been experimenting with the rfe function in the caret package to do logistic regression with feature selection. I used the lmFuncs functions with the following rfeContol :

ctrl <- rfeControl(functions = lmFuncs, method = 'cv', rerank=TRUE, saveDetails=TRUE, verbose = TRUE, returnResamp = "all", number=100)

Below is the structure of the rfe call:

fit.rfe=rfe(df.preds,df.depend, metric='RMSE',sizes=c(5,10,15,20), rfeControl=ctrl)

df.preds is a data frame of inputs to the model. df.depend is a vector of 1 or 0 corresponding to each row in df.preds to indicate response.

The resulting model accessed in from the fit object in the rfe object is of class lm and produces predicted values of less than zero and greater than 1 when I use the following code with the predict function:

predict(fit.rfe$fit,df,type='response')

Given I'm expecting this to be a logistic, all predicted values should greater than zero and less than one.

Any help will be appreciated.

Using 100-fold cross-validation is not advisable. Practically speaking, each fold is going to be 99% similar to another fold. Use a smaller number of folds (like 5 or 10) to get more new data in each fold (mix it up more) and then if you want to do repeats, specify repeats=. Otherwise you won't really simulate how the technique performs on new data. See this question for comparison of k-fold and leave-one-out CV, which is what you approach as number= goes up — C8H10N4O2, Apr 19 '16 at 12:44

O_Devinyak · Accepted Answer · 2012-09-23T19:19:38.743

6

lmFuncs is fitting linear regression. Just type lmFuncs$fit to see. Try to rewrite it:

lmFuncs$fit<-function (x, y, first, last, ...){   
     tmp <- as.data.frame(x)   
     tmp$y <- y   
 glm(y ~ ., data = tmp,family=binomial)   
}

Note, that I don't know how to attach <environment: namespace:caret> and what is its meaning. You may try this trick on your data and comment the result.

edited Sep 23 '12 at 19:19

answered Sep 23 '12 at 18:54

O_Devinyak

2,359

Thank you for your response. I get an error when I try to rewrite. lmFuncs$fit<-function (x, y, first, last, ...){ tmp <- as.data.frame(x) tmp$y <- y glm(y ~ ., data = tmp,family=binomial) } Error: unexpected symbol in "lmFuncs$fit<-function (x, y, first, last, ...){ tmp <- as.data.frame(x) tmp" – ansek Sep 23 '12 at 19:08
Original code with paragraphes was going fine. I have just edited my answer. – O_Devinyak Sep 23 '12 at 19:16
4

I got it to work by defining a new object called glmFuncs using the following code: `glmFuncs=lmFuncs
glmFuncs$fit=function (x, y, first, last, ...) { tmp <- as.data.frame(x) tmp$y <- y glm(y ~ ., data = tmp, family=binomial(link='logit'))}`. It is doing exactly what I want it to do. Thank you soooo much.
– ansek Sep 23 '12 at 19:35
It's telling me {: task 6 failed - "Results do not have equal lengths" – nigelhenry Mar 10 '23 at 14:43

How do you get lmFuncs functions of the rfe function in caret to do a logistic regression?

1 Answers1