I am working on a dataset for linear regression in R. After building the model (with the lm()function), I want to test my model on a new data point using the predict() function with a certain confidence interval. Is there a mechanism to verify whether predicting at this new data point is valid or not?
- 131
1 Answers
I am working on a dataset for linear regression in R. After building the model (with the lm()function), I want to test my model on a new data point using the predict() function with a certain confidence interval. Is there a mechanism to verify whether predicting at this new data point is valid or not?
It is not clear what you mean by
using the predict() function with a certain confidence interval.
the predict function does not use a confidence interval. It uses the estimates from the fitted model to determine the value(s) of the response variable when the explanatory variable(s) take values provided (with the newdata parameter).
set.seed(15)
N <- 10
X1 <- rnorm(N,0,1)
X2 <- rnorm(N,0,2)
Y <- 10 + 2X1 + 3X2 + rnorm(N,0,1)
dt <- data.frame(Y, X1, X2)
m0 <- lm(Y ~ X1 + X2, data = dt)
summary(m0)
Here we fit the model and obtain these results:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.4936 0.2589 40.524 1.45e-09 ***
X1 2.0075 0.2792 7.191 0.000179 ***
X2 3.1711 0.1614 19.653 2.21e-07 ***
The equation for the fitted model is therefore:
10.4936 + 2.0075*X1 + 3.1711*X2
So if we were to set X1=2 and X2=3 we would have
10.4936 + 2.0075*2 + 3.1711*3
which equals 24.0219. Now if we want to
verify whether predicting at this new data point is valid or not?
We can do so like this:
predict(m0, newdata = data.frame(X1 = 2, X2 = 3))
which returns 24.02195 which matches the calculation we did manually, therefore verifying that the predict function worked correctly.
Edit: After clarification in the comments to this answer:
I meant to ask of performance measures to conclude if 'predict()' is doing the correct thing. Like how the linear regression function 'lm()' has performance measures to quantify its behavior. Does my question make sense?
Unfortunately I still don't understand what you are asking. You say you want:
performance measures to conclude if 'predict()' is doing the correct thing.
Well predict only has one job - to output a predicted value for the response or a confidence (or prediction) interval. I have demonstrated above that it is "doing the correct thing."
Then you ask about whether
'lm()' has performance measures to quantify its behavior
Unfortunately I don't know what you mean by this. If you are asking about the performance of lm itself, well most of the computation is done in compiled code written in C or Fortran for performance reasons. Further details are here:
Least Squares Regression Step-By-Step Linear Algebra Computation
If you are asking about the performance of predict or something else, please clarify your question.
- 60,630
-
1I don't get the feeling that the function doing the right calculation is in question. – Dave Dec 05 '23 at 20:19
-
1@Dave I hear you. Hopefully my answer will prompt the OP to tell us what they mean by "test the validity" – Robert Long Dec 05 '23 at 20:23
-
-
@MichaelM I was being deliberately obtuse, hoping to elicit more details from the OP about what they are asking :) – Robert Long Dec 05 '23 at 21:58
-
1
-
I meant to ask of performance measures to conclude if 'predict()' is doing the correct thing. Like how the linear regression function 'lm()' has performance measures to quantify its behavior. Does my question make sense? – Nanda Dec 06 '23 at 00:09
-
@Nanda Please clarify that as an edit to your original question. If you could expand on what you see as measures of performance in the
lmfunction, that would be helpful. – Dave Dec 06 '23 at 11:33
predict.lm(). – Michael M Dec 05 '23 at 20:59