Why/When might it be preferable to fit to the integral of the data, rather than to the data itself?

Question

I am working on reducing the data from a particular type of particle detector. When struck by a particle, this detector produces a voltage pulse that has the form of a Gaussian convolved with an exponential decay.

Among the earlier work done with this sort of detector, I've found one journal article which suggests that:

Often times, the signal obtained with a current mode detector will have less than satisfactory statistics and will contain statistical noise as well as fluctuations due to digitization noise. In order to fit this data, it is sometimes more reliable to fit the integral of the signal.

No further justification is given for this statement, and I've never heard of doing this before.

I have no problem implementing this approach, and indeed it works well for me; but I would like to know why and when this is a good idea, how I might have thought of doing this on my own, and also to have some mathematical justification for integral fitting to be superior in certain cases. I'm not asking for a highly rigorous proof, so some hand waving is fine, but I do want more than the simple assertion made in this paper.

If the "noise" doesn't have a strong temporal correlation, then the integral consists of the signal plus a random walk, implying its error is (a) strongly autocorrelated and (b) strongly heteroscedastic (its variance increases over time). Moreover, (c) the underlying response should be monotonically increasing. This can indeed be accurately fit, but it's a good idea to use specialized techniques adapted to that model and error structure. This leads me to inquire, what fitting procedures are you comparing? That is, how would you fit the original data and how would you fit the integrated data? — whuber, May 18 '12 at 20:10
@whuber: To address your points: The noise in the raw data should have no temporal correlation afaik. Thus I suppose your (a) and (b) will hold for me. The integral is monotonically increasing (ignoring any violation of that due to noise), but the underlying response is definitely not monotonic. I looks, qualitatively, like a Gaussian. The exponential decaying component adds a slightly longer tail. Thus, the integral looks like an erf(). I'm doing my fitting by nonlinear least squares. Specifically, I'm minimizing the sum squared error between the data and my fit using fminsearch() in Matlab. — Colin K, May 18 '12 at 20:23
Just to clarify, since I wasn't explicit about it, I've got an analytical expression for the shape of the integral of the pulse, so the pulse height, Gaussian width, and exponential decay time can be pulled out of the fit and used in the expression for the shape of the pulse itself. — Colin K, May 18 '12 at 20:27
Offhand it sounds like fitting the raw data would be better than fitting the integral. The answer depends partly on the nature of the noise, though. Could you share a graph? Even better, do you have any visualizations of the residuals of your fit(s)? — whuber, May 18 '12 at 20:32
Just a general comment from me is that integration can be like a form of averaging. When averaging a sequence of n iid random variables with variance V the variance is reduced by a factor of 1/n. Any single observation Xi is an unbiased estimate for the mean m but the sample mean is far better as its variance is V/n compared to V for observation Xi. This problem is more complicated because the voltage being integated over time is correlated over time. But this general smoothing principle could still hold and I think that may be the basis for the staement the OP quoted. — Michael R. Chernick, May 18 '12 at 22:59

Why/When might it be preferable to fit to the integral of the data, rather than to the data itself?

0 Answers0