4

Those lines are based on a lowess fit of points I have.

I would like to be able to extrapolate those lines up to x=500 for instance. I would like to extend this plot and see if those lines reach a plateau or not.

Is there a way to do this ?

The CSV data for the blue line is here: http://db.tt/GHhAGLtM

enter image description here

The idea is fill this plot with extrapolated dashed lines. enter image description here

  • 8
    Extrapolation of a fitted curve is something one should only do very carefully. Your extrapolation is only as good as the model that you used to fit the curve in the first place. If you have a mechanistic model in mind (i.e. this is some natural world phenomenon and you can model it from sound theoretical principles) then you can do it. Otherwise here be dragons. If you want some more help, please describe in words what the data means. – Andrie Apr 25 '12 at 11:47
  • 6
    Voted off-topic as this is a statistics question and can hence be better asked at crossvalidated.com. Short answer: You can't extrapolate from a lowess curve, as that is a LOCAL regression. You need to use different models, and data that fits your curve perfectly or the uncertainty on the Y value at X=500 will be so large that you can't draw any conclusions. – Joris Meys Apr 25 '12 at 12:31
  • Flagged the question for moving it to crossvalidated.com. Some moderator will take care of it shortly, no need to repost the question there. – Joris Meys Apr 25 '12 at 12:32
  • To be honest I find your comments a bit harsh. I am trying to learn how I can extrapolate such curve with R. This is not a statistics question per-se. And you don't really have to know what the data means. In short I find your actions (closing this topic) completely outside the concept of stackoverlflow. –  Apr 25 '12 at 14:04
  • 8
    Sometimes the truth seems harsh. A local regression is as the name implies ... an estimate determined by the local data. You need theory if you want to extrapolate far beyond your data. lowess plots explicitly eschew theory. – DWin Apr 25 '12 at 14:08
  • @BenoitB. I'm not closing it, I'm merely moving it to a place where people will be able to help you more. And I'm very sorry, you cannot extrapolate a local regression model (see the comment of DWin). It says so explicitly in every text book on the topic. If you want to predict values WITHIN the X domain, use loess instead of lowess. loess has a predict function. If you use that to predict values outside the domain of X, get decent statistical advice. Which is why I direct you kindly to crossvalidated.com. – Joris Meys Apr 25 '12 at 15:50
  • Benoit B. has a valid question: he'd like to know how to do a specific task on a specific platform, rather than whether it makes sense to do his task. For that reason, the question should have stayed at stackoverflow. (I can think of some reasons to extrapolate a lowess fit - but 'predicting data' is not one of them!). The trouble is, I don't think it can be done in R. lowess() decides on a function to fit the data and then returns the fitted curve, but (as far as I can tell) no information about the function that generated the fit. Without the function, you can't extrapolate. – Drew Steen Apr 25 '12 at 16:39
  • Without additional information and assumptions, @Benoit, literally any function could justifiably be used to extrapolate your curves out to $x=500$. Thus, the answer to your question of how to use R for extrapolation is simply to plot any function R can compute in the range from $110$ to $500$. Now, if you would care to describe the data and the purpose of your curve-fitting exercise, then this stats community could provide some good advice for narrowing down your choices to a reasonable range. – whuber Apr 25 '12 at 16:49
  • Not sure if I understand the LOWESS function correctly, nor if this is the correct way to phrase this, but: Could you use a linear local regression, weighted by some function with infinite support and convergent tails (like the t-distribution with low df), such that outside the fitting domain, all the weights converged? Then I think the regression would converge to a global linear regression, which can be useful for extrapolation, but you'd still get the benefit of better interpolation. Not sure if the T distribution would be a suitable weighting function though.. – naught101 Dec 11 '18 at 04:36

3 Answers3

6

Curiously, I just addressed a similar question here, although that was in the context of a standard linear model, instead of loess. Reading that may give you some of the background ideas. I will take the substance of this question to pertain specifically to loess per se. The theory behind loess is to have a semi-parametric fit that yields a predicted value based only on a few nearby points, weighted by proximity. There is typically a bandwidth argument that gives the range of $x$ values that would be considered 'nearby' (although this may be determined automatically, or set by default). The weights on any existing data point outside this window will be 0. Moreover, whatever the bandwidth is set to, it will certainly not be wider than the width of your data set. Thus, the ideas behind loess absolutely preclude extrapolating a predicted value for $x=500$ from a loess fit based on data that range from [0,100]. Even beyond this however, the predicted value is the one generated when the window is centered on the $x$ value to be used for the prediction--other possible predicted values when that $x$ is within the window, but not in the exact center, are not used or given any weight. You can see how this leads to complications as the window moves towards the ends of the range of $x$; it is often considered that loess is less reliable at the extremes of your existing $x$ range. These facts should make it clear that loess, unfortunately, cannot be used for extrapolating. It is possible that a parametric model could, but my first reaction would be to be very wary even in that case (see my previous answer for a better feel for that). Sorry to be the bearer of bad news...

4

You can get a better feel for what the loess (lowess) algorythm does by running the loess.demo function from the TeachingDemos package then clicking on the plot to see the weights and window used to predict the curve at that point. This undestanding may help you to see why loess is unlikely to give anything meaningful with that much of an extrapolation.

Theoretically you could set the window wide enough that it could give a prediction at 500, but as the window gets wider the loess fit approaches a straight line and any window wide enough to generate a prediction at 500 would be pretty much the same as fitting a straight line to your data and extrapolating with that.

Greg Snow
  • 51,722
1

Extrapolating from a LOWESS or LOESS fit is a very bad idea.

LOWESS works by fitting a weighted linear model to a local subset of the data.

You find the "N" nearest neighbors to your data point. You then fit either a first order or second order polynomial to the data, weighting the regression based on the distance from the data point.

This local regression model is used to estimate a value for your original data point.

You repeat this same process for all of the data points in your sample.

Interpolation is pretty safe (and easy). Connect all your data points with a PCHIP and you're off to the races.

On the other hand, extrapolation is a VERY bad idea. Your local model is based on a very small subset of the data. Your projection is going to be incredibly sensitive to whatever type of assumption you make about your data.

Ultimately, you need to make some kind of assumption how to extrapolate from those last few end points. Consider how radically your estimation might change depending on whether you used linear interpolation as opposed to PCHIP as opposed to a cubic spline.