0

If a machine learning classification model is used to predict the binary output of 1000 observations daily, and we only care about the precision of the top 100 predictions, how can we use a custom evaluation metric ?

More details

  • For the business case, we can assume that the model predict the probability of up-selling. There are 1000 daily cases to be analyzed. If the model predicts "yes", then sales person will call the customers.
  • But there are not enough people to call more than 100 customers.
  • So we want to optimize the model only for the top 100 customers (in terms of probability)
John Smith
  • 250
  • 1
  • 4
  • 17
  • Maybe by precision you mean accuracy ?
  • Is your output variable categorical or continuous ?
  • What do you mean by top 100 predictions ?
  • – Romain Mar 21 '21 at 13:01
  • I add more details, hopefully, it is more clear. – John Smith Mar 21 '21 at 18:20
  • Frank Harrell has an example like this in his blog. The gist is that, if you predict probabilities instead of categories, you can determine the $100$ customers most likely to buy your product. Log loss and Brier score are examples of metrics that seek such probabilities. The good news is that your software probably minimizes log loss (default in logistic regression). – Dave Mar 21 '21 at 18:37
  • Thank you @Dave, do you have the link of the post of Frank Harrell's blog? – John Smith Mar 21 '21 at 19:12
  • https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ – Dave Mar 21 '21 at 19:42