Is there a better name than "average of the integral"?

Question

I'm testing throttle position sensors (TPS) my business sells and I print the plot of voltage response to the throttle shaft's rotation. A TPS is a rotational sensor with $\approx$ 90° of range and the output is like a potentiometer with full open being 5V (or sensor's input value) and initial opening being some value between 0 and 0.5V. I built a test bench with a PIC32 controller to take a voltage measurement every 0.75° and the black line connects these measurements.

One of my products has a tendency to make localized, low amplitude variations away from (and under) the ideal line. This question is about my algorithm for quantifying these localized "dips"; what is a good name or description for the process of measuring the dips? (full explanation follows) In the below picture, the dip occurs at the left third of the plot and is a marginal case whether I would pass or fail this part:

Print out of a suspect part

So I built a dip detector (stackoverflow qa about the algorithm) to quantify my gut feeling. I initially thought I was measuring "area". This graph is based on the printout above and my attempt to explain the algorithm graphically. There is a dip lasting for 13 samples between 17 and 31:

Sampled data shown with the "dip" magnified

Test data goes in an array and I make another array for "rise" from one data point to the next, which I call $deltas$. I use a library to get the average and standard deviation for $deltas$.

Analyzing the $deltas$ array is represented in the graph below, where the slope is removed from the above graph. Originally, I thought of this as "normalizing" or "unitizing" the data as the x axis are equal steps and I'm now solely working with the rise between data points. When researching this question, I recalled this is the derivative, $\frac {dy}{dx}$ of the original data.

Analysis of the derivative...?

I walk through $deltas$ to find sequences where there are 5 or more adjacent negative values. The blue bars are a series of data points who are below the average of all $deltas$. The values of the blue bars are:

$0.7 + 1.2 + 1.3 + 1.4 + 1.8 + 2.5 + 2.9 + 3.0 + 2.5 + 2.0 + 1.5 + 1.0 + 1.2$

They sum to $23$, which represents the area (or the integral). My first thought is "I just integrated the derivative" which should mean I get back the original data, though I'm certain there's a term for this.

The green line is the average of these "below average values" found via dividing the area by the length of the dip:

$23 \div 13 = 1.77$

During the testing of 100+ parts, I came to decide that dips with my green line average less than $2.6$ are acceptable. Standard deviation calculated across the entire data set wasn't a strict enough test for these dips, as without enough total area, they still fell within the limit I established for good parts. I observationally chose standard deviation of $3.0$ to be the highest I would allow.

Setting a cutoff for standard deviation strict enough to fail this part would then be so strict as to fail parts which otherwise appear to have a great plot. I do also have a spike detector which fails the part if any $|deltas - avg| > avg+std dev$.

It's been almost 20 years since Calc 1, so please go easy on me, but this feels a lot like when a professor used calculus and the displacement equation to explain how in racing, a competitor with less acceleration who maintains higher corner speed can beat another competitor having greater acceleration to the next turn: going through the previous turn faster, the higher initial speed means the area under his velocity (displacement) is greater.

To translate that to my question, I feel like my green line would be like acceleration, the 2nd derivative of the original data.

I visited wikipedia to re-read the fundamentals of calculus and the definitions of derivative and integral, learned the proper term for adding up the area under a curve via discreet measurements as Numerical Integration. Much more googling on average of the integral and I'm lead to the topic of nonlinearity and digital signal processing. Averaging the integral seems to be a popular metric for quantifying data.

Is there a term for the Average of the Integral? ($1.77$, the green line)?
... or for the process of using it to evaluate data?

I think "average dip" is good enough. It doesn't have the dimensions of acceleration, so it's certainly not anything to do with that. — ShreevatsaR, Oct 19 '13 at 03:49
And I would appreciate any observations or commentary about this topic as a whole. I am a bit disturbed at how this "gut feeling" measurement isn't better expressed mathematically. — Krista K, Oct 30 '13 at 06:21
Could you possibly add in all the data points that you used to construct the ideal line, or add in a little bit more information about how the dotted red line is computed to justify the blue bars being the "deltas who are below the average of all the data points"?
If it is morally the average distance from the average, then there should be an acceleration-style name for it, replacing of course differentiation with taking an average. — , Oct 30 '13 at 21:30
@GlenWheeler thank you for the suggestion; it made me realize many things. Massive edit ensued, I hope this helps. — Krista K, Oct 31 '13 at 02:06
Migrated from Math.SE by OP request: http://meta.stats.stackexchange.com/questions/1845/how-do-i-go-about-asking-this-question-here-on-stats-se — Willie Wong, Nov 01 '13 at 11:34
@DanS but do either of those methods imply differentiation then integration? I head mean deviation and think of averaging some kind of standard deviation. The big problem is people have been studying numbers for so long and all the good words already have meanings (implied, explicit, or to lay people like me). — Krista K, Nov 01 '13 at 22:01
Well, what you call integration seems to me to be actually just taking the average (the mean) of the blue bars, and the blue bars represent the deviation from the expected. So in my understanding, you're using the average to find an "ideal", and then finding the mean deviation from that ideal, for a segment. "Mean deviation" seems fine to me without needing to say anything about integration. (But sorry if I'm barking up the wrong tree.) Also, check out RMS (Root Mean Square) if you haven't already, a widespread measure in signal processing which finds a kind of average magnitude. — Dan Stowell, Nov 02 '13 at 20:44
What is the ultimate aim here? To provide some way to identify parts that aren't performing adequately? There are a number of ways of measuring deviation from the ideal line. — Glen_b, Nov 03 '13 at 03:18
@Glen_b yes, and that was accomplished; I want to know a better name for describing the process. "Mean" and "Mean Deviation" don't seem descriptive enough for me. — Krista K, Nov 03 '13 at 07:48
I'm still trying to clearly understand what you're doing. How, for example, you choose the endpoints of the piece you cut out from the bigger diagram. That kind of thing may affect what one might call it, but 'mean deviation from ideal' would be appropriate if you applied it over the full range of values. — Glen_b, Nov 03 '13 at 07:58
@Glen_B sorry about the confusion and thanks for sticking with this. I already have analyzed all 120+ data points. Then to see if this localized "dip" happens, I go back & step through the data to find any dips. I edited the question to try and rephrase. Some time away helped me see lapses in my explanation. — Krista K, Nov 04 '13 at 19:56
There appear to be two steps here. Step 1 is 'identify a dip, if present' and step 2 is 'measure the dip'. While you're after a name for the second part, what it actually is (and hence what, if anything, it might be called) may be influenced by step 1, and I still don't see what the criteria for concluding 'this is a dip we should measure' are (you might have said it up there, but if you did I missed it). — Glen_b, Nov 04 '13 at 21:41
Is this quote (from your linked post) the definition of a 'dip' for the purpose of this question: "5 or more negative deltas in a row"? — Glen_b, Nov 04 '13 at 21:45
Hi Glen, yes, that is how the software first determines a dip is happening. — Krista K, Nov 04 '13 at 22:22
I might add the word "local" to make it clear that step 1 exists -- I agree with @Glen_b (another Glen -- hi!) that this is important. So I would tentatively suggest "local mean defect" where I just concatenated "deviation from the ideal" to "defect". Seems suitable. — Glen Wheeler, Nov 10 '13 at 04:43
Hi Chris K, I won't see your reply if you don't @Glen_b me. Luckily Glen Wheeler got my attention, or I'd never have known you replied. — Glen_b, Nov 10 '13 at 08:06
Glen_b & @GlenWheeler : a year later, I'm going over all of this again and I think I'll go with "local mean defect". Glen_B if you really want, I can see about digging up the original array of ADC values, but I suspect we're all past that point. ;) — Krista K, Dec 10 '14 at 21:33
It's great that this lengthy comment exchange led to something good. Usually they don't. ;) — Glen Wheeler, Dec 18 '14 at 00:28

means-to-meaning · Accepted Answer · 2013-11-04T23:37:18.850

First of all, this is a great description of your project and of the problem. And I am big fan of your home-made measurement framework, which is super cool... so why on earth does it matter what you call "averaging the integrals"?

In case you are interested in some broader positioning of your work, what you would like to do is often referred to as Anomaly detection. In its simplest setting it involves comparing a value in a time-series against the standard deviation of the previous values. The rule is then if $$x[n] > \alpha SD(x[1:n-1]) => x[n]\text{ is outlier}$$ where $x[n]$ is the $n^{th}$ value in the series, $SD(x[1:n-1])$ is the standard deviation of all previous values between the $1^{st}$ and $(n-1)^{th}$ value, and $\alpha$ is some suitable parameter you pick, such as 1, or 2, depending on how sensitive you want the detector to be. You can of course adapt this formula to work only locally (on some interval of length $h$), $$x[n] > \alpha SD(x[n-h-1:n-1]) => x[n]\text{ is outlier}$$

If I understood correctly, you are looking for a way to automate the testing of your devices, that is, declare a device as good/faulty after it performed the entire test (drew the entire diagonal). In that case simply consider the above formulas as comparing $x[n]$ against the standard deviation of all values.

There are also other rules you might want to consider for the purpose of classifying a device as faulty:

if any deviation (delta) is greater than some multiple of the SD of all deltas
if the square sum of the deviations is larger than a certain threshold
if the ratio of the sum of the positive and negative deltas is not approximately equal (which might be useful if you prefer smaller errors in both directions rather than a strong bias in a single direction)

Of course you can find more rules and concatenate them using boolean logic, but I think you can get very far with the three above.

Last but not least, once you set it up, you will need to test the classifier (a classifier is a system/model mapping an input to a class, in your case the data of each device, to either "good", or "faulty"). Create a testing set by manually labelling the performance of each device. Then look into ROC, which basically tells you the offset between how many devices your system correctly picks up out of the returned, in relation to how many of the faulty devices it picks up.

I believe "why on earth it matters" is a function of your own username. :) Why? Same reason there is an iliac crest: we need words to distinctively quantify everything unique in life. Imho, this QA is an example of how limited the vocabulary is within statistics. We need to combine confusing or contradictory descriptors for what is "to the eye" so simple. — Krista K, Dec 28 '13 at 09:28
Hehe, well spotted Sir! :) If I omitted any ventures into the land of creative branding it was merely because I felt compelled to support the resourcefulness and dedication of your effort and ideas rather than to concoct vain labels. Since you insist on naming the mean of the integral, beware that what you consider the "mean of the integral" is a simple mean of your deltas. And as such, your outliers are simply "deviations from the mean", or possibly deviations from the local mean. I don't quite see the advantage of thinking in integrals, unless you don't have enough sampling points. — means-to-meaning, Jan 04 '14 at 00:44

Is there a better name than "average of the integral"?

1 Answers1