3

I have two 1D finite domains consisting of $N$ points each. In one domain the points are all regularly spaced, but in the other they are irregularly spaced. I would like to somehow quantify just how irregular the irregular domain is. Is there a method to quantify this?

Any hint/help is appreciated. Thanks.

EDIT: An example of the 1D irregular data set is $1, 1.7,2.3,4.1,4.2,4.7,5,5.8,8$.

BillyJean
  • 517

2 Answers2

2

You could look at the distribution of the difference between consecutive points. Looking at the variance of those differences should give you some idea about how consistently spaced the points are. Low variance, evenly spaced, high variance, unevenly spaced.

Of course, this won't capture more complex behaviors, like two evenly spaced sequences separated by a large gap. Comparing that to an unevenly spaced sequence over a small interval, you'd have a hard time drawing a meaningful conclusion using just a simple variance of differences.

  • I think variance is a good idea. Variance of consecutive points, however, seems counteer intuitive to me and it also doesn't generalize well to higher dimensions. Why not just take the variance of the points? Say you have points $ { x_i }{i=1}^{n}\in \mathbb{R}^d$. calculate their mean $ \bar{x} = \frac{1}{n} \sum{i=1}^{n} x_i$. Your measure of irregularity will be the variance: $\frac{1}{n} \sum (x_i - \bar{x})^2$. High variance means spread, low variance means clustered. Of course - this is far from satisfactory if you have (as mentioned by @matt) more complex structures. – Yair Daon Aug 25 '15 at 13:47
  • I tried looking at both the variance of the data points and of the difference between data points, but they gave no conclusive answer – BillyJean Aug 25 '15 at 14:04
  • "Consecutive points" could generalize to higher dimensions if you instead took the distance to some number of nearest neighbors. I don't think variance of the points alone would work as well, since points equally spaced over a small interval would get a lower "score" than points equally spaced over a large interval, when they ought to be the same. – Nuclear Hoagie Aug 25 '15 at 15:31
0

Another suggestion would be to perform a g00dness-of-fit test. You can bin the data and use a simple chi square test (as suggested here) to compare goodness of fit to a (binned) uniform distribution. A better fit means more regularly spaced data.

Yair Daon
  • 2,484
  • 1
  • 18
  • 31