I have $N$ datapoint dataset $\{x_i, y_i\}$, where $x_i$ are equally spaced over the interval $[0, 1]$, and $y_i$ are non-negative. It is known that $y_i$ is a sum of a signal and gaussian noise. The goal is to find a scalar metric $F(\vec{y})$, which would measure the center of mass $c$ and the spread $s$ of $y$. The requirements are as follows
- $c \in [0, 1]$ corresponds to position of centre of mass on the interval
- $s$ should be minimal if only one $y_i$ is not equal to zero, and maximal if $y_i$ are uniform.
- $s$ should be independent of $c$ as much as possible. That is, for example, a gaussian of the same variance centered around 0.5 or 0.1 should give the same spread.
- The metrics $c$ and $s$ should be robust to small perturbations of the data
So far, I have tried converting $y_i$ into a probability distribution $p_i = \frac{y_i}{\sum_i y_i}$, and estimating the mean and variance of that distribution. The problems with using variance as a measure of spread are as follows:
- Requirement 4 does not hold. For example, if a gaussian profile is used for $y$, addition of a small amount of Poisson noise to the system can result in the variance estimate jumping several times.
- Further, if a signal is slightly positive everywhere, even if it has reasonable SNR, the estimated variance does not change much with true variance, indicating that the simple fact of having non-zero entries has higher impact on the estimate than SNR.
I wonder if there exist other ways to estimate spread in my case. I emphasize that the spread need not be an estimate of variance. It just has to satisfy the requirements