1

I have a problem where I want to model the duration of a task. I have variables that represent the number of actions required (all of them require the same time to be executed) for task completion and the category of the task which is a categorical variable. Additionally, I have turned the categorical variable into dummy variable.

What troubles me is that my output variable is duration and more specifically number of seconds, thus positive and continuous in the [0,+00). What type of regression can I choose for this problem?

A first quick thought was to predict the log(duration) with some method like linear regression, regression decision tree or SVR but then again when exponentiating the results in order to make them interpretable as seconds, I will come up to negative time too.

Note: I would prefer not to mess up with neural nets. I'm sure there is an easier solution.

In case it helps my train data look like this:

+------------+---------+------------+---------+
|dur(sec)    |actions  |task_catA   |task_catB|
+------------+---------+------------+---------+
| 1256       | 257     | 0          | 1       |
| 857.2      | 121     | 1          | 0       |

I use R.

Mewtwo
  • 305
  • 2
    A first quick thought was to predict the log(duration) with some method like linear regression, regression decision tree or SVR but then again when exponentiating the results in order to make them interpretable as seconds, I will come up to negative time too. - Are you really sure of this? How do you get the exponential of a real logarithmic to be negative? – Firebug Nov 23 '17 at 09:54
  • 1
    The exponential distribution is often used to model waiting times and may make sense here. Look at exponential regressions. – Stephan Kolassa Nov 23 '17 at 11:10

1 Answers1

2

Your problem doesn't really need complex Machine Learning algorithms. I think one good option is Stephan's sugestion, a Exponential Regression.

This means you assume your dependent variable follows a exponential distribution, with p.d.f: $$f(x) = \lambda \exp(-\lambda x)\ I_{(0, \infty)}(x)$$

where $\lambda$ is the occurrence rate per unit of measurement, which can be time, distance, etc.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Bruna w
  • 541