0

I've written and trained a churn model that is scheduled to run every day and make new predictions for the probability of each customer to churn within coming 365 days, from the day the scoring is done.

For example, today is 2024-02-03 and after running the scoring, each of our 3 million customers get a probability (for their churn-propensity) to churn coming 365 days. The output is something like this:

CustomerID      Propensity    DaysUntilChurn

528395875 0.67 82 493575722 0.98 2 452772672 0.03 354

Where the churn prediction will be 1 if Propensity >= 0.5 else 0. Naturally, the propensity to churn decreases when there are many days left for the customer to place orders before being considered as a churn. Conversely, if the customer has not placed an order within the last 364 days, there is only 1 day left until churn and the probability of this happening is high. DaysUntillChurn = 365 - DaysSinceLastOrder.

I now want to write a program that daily monitors the accuracy of this model. However, the issue is that if I make predictions on 2024-02-03, I'm going to have to wait until 2025-02-03 in order to collect the actual churn data for all customers.

One way I thought of was to save all predictions an entire year back, but unfortunately this is not viable as it entails saving 365 data sets, each with 3 million records.

How do other retail companies monitor the daily accuracy of their churn models? Are there any other tools/metrics to look at when one wants to closely monitor model performance daily?

Parseval
  • 363
  • 3
  • 9
  • Re "not viable:" That scarcely seems like a problem, because the total amount of data is approximately one billion pairs of numbers and each number requires only one to two bytes for its representation. – whuber Feb 03 '24 at 15:48
  • Hi thanks for your comment, but I'm not sure I follow. Every day 3 million predictions are made and stored with columns: CustomerID, Propensity, DaysUntillChurn and 'ScoringDate', where the last column indicates the date when the customers was scored. So It's 4 variables to be saved and not a pair. – Parseval Feb 03 '24 at 16:19
  • Not so! First, you can store the customer IDs once and for all; and it sounds like "ScoringDate" could be placed in the name of the file: isn't it constant for a given day? According to your question, you only have a two-decimal digit "Propensity" value and a three-decimal digit "DaysUntilChurn" value. With just the tiniest bit of compression that pair requires only two bytes of storage, yielding a 6 MB file. – whuber Feb 03 '24 at 17:33
  • As whuber writes, you should absolutely be able to store this amount of information in a standard database. And I would recommend you revisit the part of comparing probabilistic predictions to a hard threshold at 0.5: https://stats.stackexchange.com/q/312119/1352 – Stephan Kolassa Feb 03 '24 at 17:54
  • Also, do you really need to store all this information? If you don't retrain your model every day, then you just need to store the models (which may or may not be smaller than the predictions) and score the correct model for the current day. – Stephan Kolassa Feb 03 '24 at 18:02
  • @StephanKolassa: Sure, limited storage is not an issue, however I take it that in order to evaluate the model scoring used today, we're gonna need to wait a year for it to be possible? No, we don't retrain it daily. The purpose of the monitoring is to know when the accuracy-metric falls below a predetermined threshold, and then retrain it IF this happens. What do you mean store the models? The train model is stored, and then used to make predictions. However these predictions needs to be saved for the whole year as I understand it? Btw thanks a lot for the link about threshold! – Parseval Feb 05 '24 at 13:44
  • Well, if you can just store the model (for a linear regression, that would essentially be the coefficient estimates, and for a neural network that would just be the trained network), then you can typically evaluate it very quickly. So I would not evaluate it for a full year in advance and store the results, but evaluate the trained model daily, compare to actuals, and only store a quality KPI over time. Make sense? – Stephan Kolassa Feb 05 '24 at 14:40
  • @StephanKolassa: Forgive me Mr.Kolassa but I don't see how I'm going to evaluate the model daily without collecting the actual outcome. Assume I do the scoring on 2024-02-05 for all 3 million customers. To evaluate this on the actual outcome, I need to wait for 365 days and then see which of the customers scored a year ago actually did and did not churn. What am I missing here? – Parseval Feb 05 '24 at 14:52
  • Why do you need to evaluate today? Can't you just store the model until 2025-02-05 and only then evaluate your model using today's data to predict who should have churned by now, then evaluate prediction? That way, you just need to evaluate the model once per day based on today's features, and you don't need to store all the predictions. – Stephan Kolassa Feb 05 '24 at 15:10
  • I get it now. Thanks for the clarification. But in that case I guess I can't really monitor the accuracy of the model on daily basis (until a year has passed). That was the managements requirement "We want a method to evaluate how well the model performs on a daily basis - compared to actual outcome". Doing it as you suggested in your latest comment still entails waiting for a year until we know how good/bad the model is. Well, I guess I'm not a wizard. If it's not possible, it's not possible. Thanks @StephanKolassa. If you could write an answer just to summarize, I can accept and upvote. – Parseval Feb 05 '24 at 15:39
  • 3
    I will see whether I find the time to write up an answer. I still wonder whether you couldn't store multiple models, e.g., retrained monthly. Then score last month's model with last month's features and evaluate to get an idea of how good last month's one-month-ahead predictions were. Then do the same with the two-month-old model using two-month-old data, to see whether that model's two-months-ahead forecasts were fine. And so forth. But as whuber wrote above, it should be possible to do this with a lot less contortions... – Stephan Kolassa Feb 06 '24 at 06:56

0 Answers0