Number of principal components when preprocessing using PCA in caret package in R

Question

I am using the caret package in R for training of binary SVM classifiers. For reduction of features I am preprocessing with PCA using the built in feature preProc=c("pca") when calling train(). Here are my questions:

How does caret select principal components?
Is there a fixed number of principal components that is selected?
Are principal components selected by some amount of explained variance (e.g. 80%)?
How can I set the number of principal components used for classification?
(I understand that PCA should be part of the outer cross-validation to allow reliable prediction estimates.) Should PCA also be implemented in the inner cross-validation cycle (parameter estimation)?
How does caret implement PCA in the cross-validation?

Useful information can be found in this post on PCA and k-fold cross-validation in caret package in R. — Ekaba Bisong, Dec 07 '16 at 20:29

score 14 · Accepted Answer · edited Mar 24 '19 at 12:44

14

By default, caret keeps the components that explain 95% of the variance.
But you can change it by using the thresh parameter.

# Example
preProcess(training, method = "pca", thresh = 0.8)

You can also set a particular number of components by setting the pcaComp parameter.

# Example
preProcess(training, method = "pca", pcaComp = 7)

If you use both parameters, pcaComp has precedence over thresh.

Please see: https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/preProcess

edited Mar 24 '19 at 12:44

Roman Abashin

113

answered Nov 12 '14 at 22:35

Jacques Wainer

5,387

Unfortunately the link is broken – Roman Kiselev Sep 28 '17 at 10:48
corrected the link – Jacques Wainer Sep 28 '17 at 17:55

Number of principal components when preprocessing using PCA in caret package in R

1 Answers1