I have 5 features in my data. The R squared value when I use features 1,2, and 3 is $x$ and the R squared value when I use features 1,3, and 4 is $x + 0.1.$
Does this mean my second model is better than first model?
I have 5 features in my data. The R squared value when I use features 1,2, and 3 is $x$ and the R squared value when I use features 1,3, and 4 is $x + 0.1.$
Does this mean my second model is better than first model?
The answer comes down to what you might mean by "better." $R^2$ is an appropriate measure of goodness in an Ordinary Least Squares regression, provided you are confident all the conditions needed for its application apply.
Here is a simple example to illustrate the point.
A response variable $y$ is plotted on the vertical axes against two (uncorrelated) explanatory variables (both of which exhibit the same range from $1$ through $8$). The univariate least-squares fits and $R^2$ values are shown. You decide which is the better model. Is the issue really settled by a mere comparison of the $R^2$ values?
SuperUser, here are a couple of examples along the lines of what Galen suggested.
For example one model may be chasing (fitting) the "Noise" better than the other. One could assess Predictive R squared (a form of cross validation (leave one out)), and it is possible that the Higher R squared model has a lower Predictive R squared. R squared is highly dependent on the dataset and may not be representative of the ability to predict.
Here is a general introductory discussion of R squared vs Predictive R squared with some good links: https://www.datasciencecentral.com/alternatives-to-r-squared-with-pluses-and-minuses
Another example: Let's say the cost of getting the feature "4" is high (difficult to get), but feature "2" is easier (less costly) to obtain. The first model would be "better" even though the R squared is not as high.
I would say you can't conclude that the second model is better just based on the R squared. For example, if we look at the cars dataset in R we can see that even by including a random variable with no relation to the dataset our R-Squared increases.
set.seed(1)
data("mtcars")
attach(mtcars)
two_pred <- lm(mpg~ disp + hp, data = mtcars)
summary(two_pred)$r.squared #0.748
random <- rnorm(nrow(mtcars))
two_pred_random <- lm(mpg~ disp + hp+ random, data = mtcars)
summary(two_pred_random)$r.squared #0.752