4

I am searching for a good measure to capture clustering in time based events. Say that for a given interval of 5 minutes, cars randomly pass in front of house ... say one car at every 10 seconds (for a total of 30 cars). For another interval of 5 minutes, 3 cars passes at every thirty seconds, and the time between the three cars is 0 (again 30 cars over 5 minutes). The difference between both intervals is that for the second intervals, the arrival of cars "clusters". What can be a good measure to capture clustering? I would like a measure that indicates that there was more clustering in the second interval than the first one.

CharlesM
  • 573

2 Answers2

4

I think the measure you are looking for is called "clumpiness," or non-constant propensity, or rather temporary elevations of propensity, characterized by serial dependence and non-stationarity. Take a look at the draft version of New Measures of Clumpiness for Incidence Data by Yao Zhang, Eric Bradlow, and Dylan Small for some alternative ways to measure this sort of thing. The paper starts out defining some desirable properties of clumpiness measures:

  • Minimum - the measure should be at the minimum if the events are equally spaced
  • Maximum - the measure should be at the maximum if all the events are gathered together
  • Continuity - shifting event timing by a very small amount should only change the measure by a small amount
  • Convergence - as events move closer/further apart, the measure should increase/decrease

Then it evaluates several existing measures and proposes 4 new ones. It also evaluates them using some simulation with various types of DGPs. There's also an empirical illustration using Hulu data.

dimitriy
  • 35,430
  • thanks for the suggestion! sorry for the late reply! I will check the paper – CharlesM Dec 24 '12 at 12:56
  • in fact the measure I had in mind was a simple D'agostino Ksquare but that would only work if I assume that my theoretical dist. is normal...though I want to assume uniform. – CharlesM Dec 24 '12 at 13:10
0

If the cars pass following a Poisson distribution of 6 cars per minute, the time between cars is exponential rate 1/6 of a minute. Compare both examples to the exponential distribution using the Kolmogorov-Smirnov test. The clustered one will be significantly different (provided you have enough data). The regularly spaced one should also be significantly different as it isn't random, though I don't expect you mean the cars pass precisely every ten seconds.

  • thanks! Yes I assume that the regularly spaced one is more random than cars that pass precisely at every ten seconds. I will take a look at the KS test. – CharlesM Oct 18 '12 at 16:15
  • but can I use the KS to assign a value or number to each intervals for their "degree" of clustering? I believe that they KS test is simply use for hypothetical testing. - I would like to do a cross-sectional comparison. And that value of clustering that I assign to each interval, I will use it in cross-sectional regression. – CharlesM Oct 18 '12 at 19:00