Partial F-test vs Model Selection

Question

I'm a first year statistics graduate student taking a course in regression. In the previous chapter we covered, we discussed partial F-tests for deciding whether to include a predictor variable. In the current chapter (which we just finished), we covered six model selection criteria. I was expecting these two concepts to be linked together at some point, but there's nothing in the book about it. Does anyone know what is the relationship between partial F-tests and model selection? To me, it looks like partial F tests should be considered a model selection criterion.

(Just in case one of them turns out to be equivalent) What were the model selection criteria that were included? — Glen_b, Feb 28 '14 at 00:17
I'm talking about using the R^2 criterion, the MSE criterion, Mallows Cp, AIC, BIC, and PRESSp. — Biomath, Feb 28 '14 at 00:50
How were the R^2 and MSE criteria set up for evaluating adding single variables? One of them might be equivalent to the partial F. — Glen_b, Feb 28 '14 at 00:52
It's very possible you're right about that. R^2 is set up as just seeing when adding in new predictors doesn't add much of a benefit in R^2. In the MSE we look to reduce MSE as far as possible. So there's an explicit penalty. — Biomath, Feb 28 '14 at 00:57

Stat · Answer 1 · 2014-02-28T05:49:50.070

2

All the automatic procedures to select the best model including "Forward Selection", "Backward Elimination" or "Stepwise Regression" are (in principle) based on partial F-tests. In other words, the inclusion or exclusion of the variables will be assessed by partial F-test. To find out the exact algorithm for each method mentioned above, you can have a look at Section 10.2.2 of Introduction to Linear Regression Analysis, 5th Edition, by Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining. I should add that these procedures can also be assessed by other criteria like AIC as we have in step function in R. However, Paul Rubin wrote a function here in R called stepwise that uses partial F-test rather than AIC.

edited Feb 28 '14 at 05:49

answered Feb 28 '14 at 05:20

Stat

7,486

Not necessarily. You can (and most packages do) run those algorithms using other metrics. For instance, R's step command uses AIC and does not calculate F-tests. It calculates the AIC for the model that would be generated by removing each variable, respectively, and moves to the model which reduces AIC the most. Rinse and repeat until you can't reduce AIC further. No F-tests required. – David Marx Feb 28 '14 at 05:34
1

I didn't say it is required! But if you open any standard regression text book, they will explain these procedures using the partial F-test and not AIC ... Cheers! And one more thing, you can see the function written by Paul Rubin here to see how we can implement in R. – Stat Feb 28 '14 at 05:38

score 1 · Answer 2 · answered Feb 28 '14 at 06:06

These different criteria are all motivated by different things. In particular, most of those criteria are motivated by a desire to create a "sparse" model. Basically, relatively recent findings in statistics have shown that if you can reduce the complexity of your model while not significantly changing its explanatory power, the simpler model will generally give better predictions (i.e. be less susceptible to overfitting).

As Stat pointed out, you can perform these tests using partial F-tests. But if we go that route, we're not driving towards parsimonious (lower complexity) models specifically: we're looking instead for models that don't contain any non-significant variables.

Let's say instead that we are less concerned about the significance of our variables (this is sort of a weird thing to say but bear with me) and instead want to focus on simplifying our models. Then it might be useful to base our model selection criteria on information theory, which is where AIC comes from.

Now, let's say we're bayesians then, ok, for reasons I don't entirely understand, the BIC is motivated by the bayesian approach. I'll come back to this post and fill out BIC later or something. Just trust me that, although it's similar to AIC, the underlying motivation behind the metric is bayesian.

Let's say we really just want to focus on the predictive power of our models. We can't trust the models to not be overfitted if we're just comparing them against our training data, so maybe we divide our training set into a training and test set and compare models based on test set error, or we use k-fold cross validation to estimate the error on unobserved data. Maybe we decide that leave-one-out cross validation is suitable to our needs: this is just the PRESS statistic.

And so on and so forth.

Each of these model selection metrics is motivated by a different model selection approach, and your model selection strategy would probably be best served by not relying on a particular statistic. You absolutely can include F-tests in your approach. Maybe you have a couple of different models you are considering that you arrived at via backwards elimination using AIC, forward selection using AIC, and best subsets for some range of $N$ parameters. You notice that many of your candidate models have a particular variable in common, but your model with lowest AIC does not. You perform some F-tests on various models adding/subtracting that variable and find that it is significant, including to your "best" model in which it was removed. Maybe you reconsider removing this variable. But you later find that your cross-validation error is significantly reduced when you remove this variable, and so you ultimately decided to leave out it out and have multiple justifications for doing so.

What I'm trying to get at is that any one of these model selection algorithms/criteria can produce a set of candidate models. There will be some intersections in these sets of candidates, but there will also be some disjunctions. You should use all the tools avilable to you relevant to the problem to pick which model is best for your needs, and which variables are acceptable to remove from your model and which are not.

tl;dr: Different metrics are motivated by different model selection strategies. Generally, you shouldn't rely on a single model selection metric. Build a set of candidate models using various algorithms and techniques, rate these models using several different metrics, and figure out which model is best for you by balancing its pros and cons revealed by these tools.

score 0 · Answer 3 · answered Feb 28 '14 at 00:16

0

Model selection deals with selecting the best model out of reasonable ones. F test is about rejecting the model. It doesn't tell you which model is better off those not rejected.

answered Feb 28 '14 at 00:16

Aksakal

61,310

1

I was not talking about F-tests for the whole model. I'm talking about partial F-tests that test whether a specific predictor variable's effect can be distinguished from 0. – Biomath Feb 28 '14 at 00:49
it's the same thing. let's say you start with 100 variables, run partial F-tests, and reject 98 variables. Does this mean that the remaining two (not rejected) variables should be left in the model? No. It can't tell you whether it's better to keep one or two variables. You'll use model selection criteria to decide which of two variables or both to keep. – Aksakal Feb 28 '14 at 01:06
So is it that I should start with removal of variables with partial F-tests, and then use the other model selection criteria to decide whether to keep the remaining ones? – Biomath Feb 28 '14 at 01:14
Not necessarily, because F-tests are for all sorts of hypothesis testing. For instance, you could see whether two coefficients are equal, or one is greater than other etc. You could be tempted to use F-test's p-value as a model selection criterion, but that's not cool. You obviously want to select among models which have significant variables in them, but even that is not always possible. Sometimes, the data is bad that you may opt to keep a variable which is rejected by F-test. it all gets hairy once you step out of a classroom. – Aksakal Feb 28 '14 at 01:29
I see that there's a difference, but I'm still not sure I really grasp it. If you happen to know of a good text that discusses this, I'd be interested in reading about it. Thanks for the help. – Biomath Feb 28 '14 at 01:34

Partial F-test vs Model Selection

3 Answers3