2

I'm looking at the concentration of an event occurrences in a given interval of time. For example, we suppose that an event occurred 4 times in an interval of length 10. I can represent this as a string where X means the event occurred whereas o means the event didn't occur.

The distribution of events could be rather "uniform" in time :

X o o X o o X o o X

Or it can be "concentrated" like this :

X X X o o o o o o X

Or like this :

X o o o X X o o o X

I'm looking for an index or a distance that would allow me to measure and compare the concentration of event occurrence in my interval. Ideally it would allow to compare interval of different lengths, for example it would distinguish between :

X o X o X o X o X o X

And :

X o o o X X o X

Do you know of any index or measure that would allow such a thing ?

Many thanks in advance.

juba
  • 265
  • Would you know what is the minimum & maximum lengths of strings such as o o o X X o X you would encounter in your data ? –  Feb 04 '14 at 14:52
  • @user102890 As it would be weeks in a year, yes, the minimum will be 2 and the maximum 52... – juba Feb 04 '14 at 15:01
  • This question looks similar to http://www.kaggle.com/forums/t/4539/measuring-fragmentation-of-activity –  Feb 04 '14 at 15:03
  • 1
    Is xxoooo equally concentrated for you as xxxxoooooooo? One might argue that the 2nd is more concentrated because despite that there appeared more positional options to break (xxxoxooooooo) the 4 x's cling together still. – ttnphns Feb 04 '14 at 16:33
  • @ttnphns That's a good point. I tend to agree that the second one should be considered more concentrated, but I would deal with a method which would consider both equally. – juba Feb 05 '14 at 12:08
  • @user102890 Thanks for the pointer. It seems that the method described implies that the sequences are all of the same length, but I'll take a look at it. – juba Feb 05 '14 at 12:10

0 Answers0