I'm sending a number of DNA test kits to customers each day. The customers swab their cheeks to gather DNA and send back the kit for processing.
I have data about each test kit that has been sent for the past two years. The data includes the date it was sent, the state that it was sent to, and the date that it was received back (if it has been received).
We process each kit the day it is returned. I'm trying to forecast the number of kits that will be returned each day, looking forward 60 days, so I can staff accordingly for processing. Each state will process its own tests so I need to estimate the number of kits to be processed in each state for each of the next 60 days (3000 estimates updated daily).
Many customers return their kits within one week, but some take a few weeks, sometimes up to two months. Some never return their kits at all. (We can treat kits sent more than two months ago as if they will never be returned if it simplifies the problem.)
We will update the forecasts each day. Forecasts far out are expected to be less reliable; most of the kits that will be processed in 60 days haven't been sent out or even ordered yet. Forecasts for dates within the next few days should be more reliable since we know how many kits are outstanding.
A confidence interval estimate, e.g. 30-50 kits at 80% confidence in NY on June 1, 2023, would be great, but a point estimate would be helpful too.
The data looks like this:
Sent,Returned,State
2023-01-02,2023-01-10,CA
2023-01-02,2023-01-15,NY
2023-01-04,NA,CA
What's a good way of modelling this problem?
I've been looking into using ARIMA on the number of kits returned each day, but not sure how to make the predictions factor in the number of sent, but not yet returned.
I also starting looking at Cox survival modeling, but I'm not clear on how to predict the number that would be returned each day.
Also thought about using multiple linear regression with 60 columns for number of kits sent 1 day ago, 2 days ago, etc, but not sure how to deal with the sparseness at we try to predict further and further out.