I recently wrote a program that graphed data points so that a user could scroll through them and find "interesting" parts of the data.
Now I am looking at ways to make it even simpler by making a table of values deemed to be interesting. These are usually either the value switching to a new value, or spiking and returning to the old one. The main problem is that I have to take into account a lot of noise.
Since I'm already saving hours of data trawling, making it conservative (i.e. is trigger happy) is fine.
Algorithms and theories I'm looking at are k-nearest neighbour and Local Outlier factor.
I am looking at using both, and then collecting them by saying errors will be unique within +-100 samples, giving errors detected by both a higher rating... Or something along those lines.
What algorithms are there I should be looking at? And what would you recommend as the best algorithm?
Edit: More specific information:
The values are read in ~20ms intervals (but to say time is constant is ok), and are between +-10 (as single precision floats). The amount of noise is unknown, but is pretty consistent (i.e. same magnitude of noise throughout), and usually jumps between two values.
A typical set of data is 120000+ records long (so records are read 1 at a time, and the whole set, or even it's size, is not known), changes can occur anywhere in that set, and at any frequency, though the change will usually be >2 (though that fact is not reliable) and aren't usually within 1000 records of each other (again, not reliable). Typically the change or spike is <2 records long, and there are few gradual changes (though the possibility of them cannot be ruled out. Changes and spikes are always obvious to a human, and since this is to point a human in the right direction it only needs to be accurate to +- 100 records.
At the moment I am looking at something along the lines of: 1 - Get the following from 100 records the mean and standard deviation 2 - For each value test if it is within 2 standard deviations of the mean. 3 - if not then print the change and start from 1 again (ignoring 20 records)
That would be simple to implement, and would show gradual changes as being changes. I could add more complex checks to find the type of exception (spike or change) and the 20 doesn't need to be a constant