Out of curiosity, I want to understand how to model this problem. I've been hearing people suggest the use of linear regression but I am not sure how to encode this problem (included my attempt below) in R as I am a complete beginner in this area.
I have a task that can be done any number of times (each individual instance is a task instance). Everytime the task completes 1%, I recorded the time elapsed since the task's start time. Therefore, for each task, I will have 100 points (100 1% increments) at which I recorded the time elapsed.
Given that I have this data for many instances, is it possible to predict the finish time for this task when a new task instance is given?
TaskID Percent TimeElapsed
1: 1 0 0.2035333
2: 1 1 0.2062833
3: 1 2 0.2137167
4: 1 3 0.2180833
5: 1 4 0.2490833
---
3127: 31 96 4.9391667
3128: 31 97 4.9970500
3129: 31 98 5.5644500
3130: 31 99 5.6532667
3131: 31 100 5.8359833
A quick look at the task behavior (below) tells me there is a bit of a variance in how the task behaves so its hinting that the output should not just be a time prediction but rather a time prediction with some confidence?
In addition, I'm thinking just using the information about the current progress of the task might not be sufficient - the task may have slowed down in some its previous progress points so the finsh time would be affected. Therefore, this information should somehow be encoded into the model?

I am particularly interested in how to do this using R. I included my initial attempt at using linear regression here but the result does not look good to me. Any suggestions on how to improve this or use some other methods?
I have given the output of dput (on a data table: install.packages("data.table")) on pastebin. If you want a data.frame instead, please see this paste instead.
EDIT: Attempt at using linear regression
The thick black line is the median at every point. The thick red line is the regression line fit to the median line.


Now the mean prediction at any time is easy - just sum up the means of the remaining time periods. I don't know right now a good way of doing confidence intervals ( apart from a bootstrap with eg mean and variance/covariance at each percentage).
A simple way would be to assume each % diff is independent normal, then add up remaining means and variances to get