Possible Duplicate:
Simple algorithm for online outlier detection of a generic time series
Getting rid of spikes in sample data
How could I get rid of sparky (aka spikey) data in a discrete data set, but in a "smoother out" manner?
Take for instance

There are two sparks, at 20000, but the next one at 600 is also considered a spark.
I've managed to get the very high ones to zero, by
a = 2
b = 5
beta_dist = RealDistribution('beta', [a, b])
f(x) = x / 19968
normalized_insertions = [f(i) for i in insertions]
insertions_pairs = [(i, beta_dist.distribution_function(i)) for i in normalized_insertions]
plot_b = beta_dist.plot()
show(list_plot(insertions_pairs)+plot_b)
No idea how to go about the lower ones. The maximum should be reached at 100, perhaps the parameters for the beta distribution need a little more twiddling?
Currently, it looks like this:

If possible, use sage as a reference for your explanations.
You need to explain precisely what you are doing and supply your rationale before we can intelligently evaluate what you have done or even to be able to suggest a better approach.
– Michael R. Chernick Sep 14 '12 at 15:08