2

I'm working on developing a confidence interval for classification predictions. Let's say the model I'm working on predicts whether a person defaults on a loan. I want to develop a confidence interval for the sum of people who I predict to default.

Running the model several times would take too much time, so I was looking for ways to do this without doing that. The method I ended up using was creating a matrix of random numbers between 0 and 1 and comparing the predicted probabilities to these random numbers.

sim.matrix<- replicate(100, runif(100, 0, 1))
predictions<- runif(100, 0, 1)
quantile(colSums(predictions>sim.matrix), probs=c(.025, .975))

I think this would work, if my predicted probabilities were normally distributed. The issue is, they aren't; half of my values are below .01; because of this, the confidence interval I make is lower than expected. Do any of you guys have a fairly quick and efficient method to try and create this confidence interval without rerunning the model several times?

intern
  • 331
  • 1
  • 6
  • 2
    What is the model? (It has probabilistic output? logistic? Bayes? other?) – GeoMatt22 May 07 '17 at 02:20
  • I was using a gbm for classification with two classes – intern May 07 '17 at 12:58
  • That sounds like important work. You may want to check out Bates, Hastie & Tibshirani (2021) they are doing something similar https://arxiv.org/abs/2104.00673 – GuillaumeL Mar 28 '22 at 11:56
  • Also see this answer by user ab90hi who has done some simulations here that are related to what I think you want to do. https://stats.stackexchange.com/a/255131/133345 – GuillaumeL Mar 28 '22 at 12:09

0 Answers0