4

This is a question more for advice and a suggested starting point than anything else (though anything else is cool as well )

The data that I have is something like this - 1,000,000 data points of which about 9/10 are 0's, the rest are strictly positive.

The way that is best to understand this process is as a two step process: 1) Something like a binomial distribution says if a event occurs or not. 2) If the event occurs then we take a random draw from the continuous distribution over a positive range (something like an extreme value distribution).

I am curious about the best way of bringing these two separate distributions into a single distribution; so if any one has any suggestions they will be more than welcome!

  • 1
    Look up "hurdle models" - log-normal & gamma hurdle models are popular choices where the continuous part is strictly positive. Sometimes the same kind of data is taken to be censoring of negative values (e.g. in Tobit models) – Scortchi - Reinstate Monica Oct 24 '14 at 11:30

0 Answers0