0

My team works to resolve online tasks assigned to them through a queue system; time taken to clear each task is measured (called handle_seconds). Approximately 18% of the tasks turn out to be defective. I want to check if there is a relationship between time taken to complete the task (handle_seconds) and the chance that output is defective.

One option I considered is bucketing the number of tasks and defects based on time window of, say every minute of handle_seconds and see the trend of share of defects in each window. But the handle_seconds range from 16 seconds to sometime a few hours, so the range is very large.

I wanted to know if there is a more accurate and established way of solving this, such as a statistical test.

1 Answers1

0

This seems like a simple application of binomial logistic regression.

The outcome would be a 0/1 binary variable for no-defect/defect. The main predictor variable of interest would be handle_seconds, treated as a continuous variable. The wide range of that predictor can be addressed by working with its logarithm and by using a flexible model of its association with outcome, for example via regression splines. Breaking it up into bins is not a good idea.

If there are other aspects of the tasks that might be associated with outcome, those can also be incorporated into the logistic regression.

This UCLA web page has links to how to implement logistic regression with several software packages.

EdM
  • 92,183
  • 10
  • 92
  • 267