3

I'm looking at data for my company, and basically we have some periods over the last year where the data was not uploaded correctly. In this figure, "mu" is the value of interest, and duration is a time in days. The huge spike near 180 days is an artifact, as are the smaller spikes at (140, 200, 275, etc). Because I don't want to smooth out the initial part of the curve near 0 days, I'm having trouble coming up with a good way of smoothing out these artifacts, and have come here looking for suggestions. Thanks, in advance!

enter image description here

thecity2
  • 1,955

1 Answers1

3

If you know the data are mistakes, you can just delete them. If it is important to capture the artifact (whatever it is) why not add it as an independent variable and include it in a model?

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • I totally agree; if you know that some portion of your sample is corrupted and unrealistic just excluding it from further analysis is the cleanest thing to do. – usεr11852 May 27 '13 at 00:38
  • Of course, I have thought of deleting the data points, but they are on a production server and spread across numerous variables, so it will not be a simple matter at all. Furthermore, I would still have to interpolate between the other points, but perhaps, that could easily be done using splines. For the sake of argument, let's assume the data really can't be "cleaned" by just removing those points. – thecity2 May 27 '13 at 00:53
  • Then please tell us what you actually have, since if you have "numerous" variables you don't have "two" and if you have two you can remove these points easily. – Peter Flom May 27 '13 at 10:16