It is easy to find a package calculating area under ROC, but is there a package that calculates the area under precision-recall curve?
-
ROCR, pROC - are really nice! – Vladimir Chupakhin Sep 05 '11 at 15:37
-
1They certainly are, yet AFAIK neither can calculate the area under precision-recall curve. – Sep 05 '11 at 16:57
4 Answers
As of July 2016, the package PRROC works great for computing both ROC AUC and PR AUC.
Assuming you already have a vector of probabilities (called probs) computed with your model and the true class labels are in your data frame as df$label (0 and 1) this code should work:
install.packages("PRROC")
require(PRROC)
fg <- probs[df$label == 1]
bg <- probs[df$label == 0]
# ROC Curve
roc <- roc.curve(scores.class0 = fg, scores.class1 = bg, curve = T)
plot(roc)
# PR Curve
pr <- pr.curve(scores.class0 = fg, scores.class1 = bg, curve = T)
plot(pr)
PS: The only disconcerting thing is you use scores.class0 = fg when fg is computed for label 1 and not 0.
Here are the example ROC and PR curves with the areas under them:
The bars on the right are the threshold probabilities at which a point on the curve is obtained.
Note that for a random classifier, ROC AUC will be close to 0.5 irrespective of the class imbalance. However, the PR AUC is tricky (see What is "baseline" in precision recall curve).
- 390
AUPRC() is a function in the PerfMeas package which is much better than the pr.curve() function in PRROC package when the data is very large.
pr.curve() is a nightmare and takes forever to finish when you have vectors with millions of entries. PerfMeas takes seconds in comparison. PRROC is written in R and PerfMeas is written in C.
- 103
- 41
A little googling returns one bioc package, qpgraph (qpPrecisionRecall), and a cran one, minet (auc.pr). I have no experience with them, though. Both have been devised to deal with biological networks.
- 53,725
-
This minet looked nice, but it needs to have some external adapter to make appropriate input from general data :-( – May 08 '11 at 09:09
Once you've got a precision recall curve from qpPrecisionRecall, e.g.:
pr <- qpPrecisionRecall(measurements, goldstandard)
you can calculate its AUC by doing this:
f <- approxfun(pr[, 1:2])
auc <- integrate(f, 0, 1)$value
the help page of qpPrecisionRecall gives you details on what data structure expects in its arguments.
- 21
-
1Doesn't the PR-curve require some more fancy integration? See: http://mnd.ly/oWQQw1 – Aug 31 '11 at 12:37

