3

I have a very long time series (about a gigabyte in ascii format) that looks like this:

1, 0.5

2, 0.52

3, 0.3

. . .

The points occur at integer time points and are predominantly 1 second apart. A small proportion are missing. The series is known to peak when a certain phenomena occurs. However, there is some underlying noise which makes it difficult to distinguish real peaks from random noise.

I have an algorithm which has produced about 500 points where peaks may occur.

I extracted a profile of 101 points, 50 before the position of the peak position and 50 after, to see what was happening around each predicted peak position. Inspecting by eye, a lot of the profiles didn't look very much like peaks. (I think this is because there is a lot of noise in the series.)

I decided to average all the profiles together. The result is a new series where the first point is the average of all points -50 time steps away from my center point and the last point is the average of all points +50 time steps away from my central point.

My hope was to see a peak close to the center point of the average profile which I did, approximately +2 time steps from the center point. I take this +2 number as a measure of accuracy of the peak locating algorithm. (If it turns out some of the points in a profile are missing, which happens rarely, I just add the ones that are and ignore the missing values, so some positions in the average profile are derived from slightly more points than others.) The average profile has a classic peak shape with the values slowly rising to a peak, as one moves from left to right, about 2 positions after the center point and declining again.

If I pick 500 positions randomly, I get a relatively flat line which I interpret as almost constant.

So, I think the peak locating algorithm actually does have some ability to recognizing when the series is peaking.

I want a statistical test to tell me when I should judge the average profile as flat and when I should just it as a peak. Can someone please help?

Henry B.
  • 1,629
  • Not quite sure what you want to test. You took your 500 detected peaks and averaged their profiles and get a peak. You took 500 random locations and averaged their profiles and get something fairly flat. Are you trying to compare these two averaged profiles to say that the one is significantly more peaked than the other? Or are you looking for something to use on each detected peak to see how close it is to the average profile? Or... ? – Wayne Dec 05 '11 at 22:46
  • @Wayne: You took your 500 detected peaks and averaged their profiles and get a peak. [YES]; You took 500 random locations and averaged their profiles and get something fairly flat. [YES]; Are you trying to compare these two averaged profiles to say that the one is significantly more peaked than the other? [YES] – Henry B. Dec 05 '11 at 23:30
  • In which case, I think jbowman's answer (via simulation) makes sense. – Wayne Dec 06 '11 at 17:38

1 Answers1

2

If I am understanding the problem... one test would be as follows. Calculate the value at the peak using your peak locating algorithm, call this $T_0$. Then randomly select 500 positions from your data, and calculate the value at the (not real) peak as located using your peak locating algorithm. Do this 1,000 times or so, depending upon how much time it takes. Keep track of the 1,000 "peak" values $T_1 \dots T_{1000}$. Then compare $T_0$ to the $T_{1 \dots 1000}$. If $T_0$ is greater than, say, 95% of the $T_i$, then you could reject a null hypothesis that the peak as detected by your algorithm was a random artifact at the 95% level of confidence.

jbowman
  • 38,614
  • I think the issue is that it is "difficult to distinguish real peaks from random noise" because of "false" peaks such as $(0.02, 0.50, 0.12, 0.09, 5.02, 0.88, 0.01, 0.23, 0.11)$ as opposed to what Henry wants to detect, "...a classic peak shape with the values slowly rising to a peak...," such as $(0.50, 1.10, 2.33, 4.44, 5.02, 4.01, 1.87, 0.94, 0.01)$. So $T_i(x)$ needs to be chosen not as the value at the center point of the profile $x$, but instead as a function that captures whether there is a pattern of increasing values before the peak of $x$ (and decreasing values after the peak). – lockedoff Dec 07 '11 at 18:50
  • Perhaps something like the Shannon entropy of absolute first differences. – lockedoff Dec 07 '11 at 18:56