I'm quite new to this, and learning R-squared and Machine Learning side by side. The problem I'm having may be my programming, or it may be that my stats knowledge is a little off.
I'm finding that after adding more predictors to a linear model using the caret library, R-squared appears to decrease when using the postResample function. This also differs from R-squared presented when running summary(model).
When adding multiple variables - carat, depth, z, and x:
summary(model)R-squared is .95postResampleR-squared is .79
If I only use one variable - carat:
summary(model)R-squared is .91PostResampleR-squared is .85.
I'm not sure why postResample is giving a totally different R-squared. It may be that I've misunderstood its purpose. I'm also not sure why it decreases as I add more variables to the model.
My code:
# Target
y <- Diamonds$price
Create model
model <- train(y ~ carat + depth + z + x, data = diamonds, method = "lm")
summary(model)
Extract predicted values from model
ypred <- fitted(model)
postResample(ypred, y)