I am trying to loop through dataframe z (columns 2 through 9) to test each factor in my simple regression model in r. I then try to predict the 'LD50' of a test data set 'holdout'. The model works but the predict function does not, I get this error:
Warning messages: 1: 'newdata' had 12 rows but variables found have 27 rows
for (i in 2:9){
fit=lm(LD50 ~ z[[i]] , data= z)
print(summary(fit))
predict(fit, holdout)}
I know that the way I call the variable z[[i]] is the problem, but I cannot figure out how to call the variable within the for loop without doing it this way. I have tried lapply and I can get it to work perfectly for the simple regression model.
THIS CODE WORKS:
varlist <- names(z)[2:9]
lapply(varlist, function(x) {
fit <- lm(substitute(LD50 ~ i, list(i = as.name(x))), data = z)
print(summary(fit))
LD50_2 <- predict(fit2, newdata= holdout)
})
But my ultimate goal is to do a multiple linear regression model and I can't figure out how to do that with lapply (any suggestions here would be welcomed). I can get the models to work with a for loop but the predict function doesn't work. Basically I want to cycle between columns 2:9 in dataframe z and input them into a model like this:
for (i in 2:9) {
fit=lm(LD50~z[[i]], data = z)
print(summary(fit))
predict(fit, holdout)
}
for (i in 2:8) {
for (j in i+2:9){
if(j>9){break}
else{
fit=lm(LD50~z[[i]]+z[[j]], data = z)
print(summary(fit))
predict(fit, holdout)
}}}
for (i in 2:7) {
for (j in i+2:8){
for (k in j+2:9){
if(k>9){break}
else{
fit=lm(LD50~z[[i]]+z[[j]]+z[[k]], data = z)
print(summary(fit))
predict(fit, holdout)
}}}}
For this code the model works but the predict function does not and outputs the same error as the simple linear regression model.
Warning messages: 1: 'newdata' had 12 rows but variables found have 27 rows
Does anybody have any suggestions on how to get this running properly?