1

I want to know whether we can calculate accuracy for principal component regression in R. The Target variable has only two values 0 or 1. I tried factoring training $ Target and validation $ Target variable but it says col(y) must be numeric. So leaving it as is.

pcr.fit <- pcr(training$Target~.,data=training, validation="CV")


summary(pcr.fit)


validationplot(pcr.fit)

coefplot(pcr.fit)


pcr.predictions <- predict(pcr.fit, newdata, ncomp = 3, type="response")

conf <- confusionMatrix(pcr.predictions, validation$Target)

Error in confusionMatrix.default(pcr.predictions, validation$Target) : 
  the data cannot have more levels than the reference
I also tried changing type to "class" but later came to know that Error  in match.arg(type) : 'arg' should be one of “response”, “scores”
  • I would recommend using a library like caret or mlr to take care of these cross-validation issues in a structured and coherent manner. That said it is unclear if the target value is a label or actually numeric... I believe you want to ultimately classify things but in that case pcr is not the proper tool as it works only for regression. – usεr11852 Apr 02 '18 at 13:07
  • It does appear you are trying to predict a dichotomous outcome with a method that assumes normal distributions. Consider clarifying the question or requesting information on which technique is appropriate. – Gregg H Apr 02 '18 at 13:17

1 Answers1

0

Without having the packages at the top of the code, I cannot determine what exactly these functions do. However, it seems like the culprit is that you are predicting values on a continuum and then trying to determine the confusion matrix. However, a confusion matrix requires discrete categories. When you have predictions on a continuum, probably every single prediction is at least a little bit incorrect. There are also likely to be more distinct predictions than categories, which is completely consistent with the error message that you have more levels (distinct values) in the predictions than in the original data.

A way to solve this is to apply a threshold to the continuous output to bin the continuous predictions into discrete categories. There are issues with doing this, and I encourage all readers to go through that link if they are unfamiliar with this material. However, that will lead to predicted categories for which a confusion matrix can be calculated.

Dave
  • 62,186