Related to this question, if I have 1500 or so jackpot results from a 6/49 lottery (numbers drawn, number of winners and prize per jackpot winner), how can I demonstrate that some numbers are less likely to be chosen by players than others? I don't have direct access to the distribution of numbers actually picked, of course.
Take, for example, the hypothesis that players are more likely than would be expected by chance to pick numbers corresponding to dates, ie from (1-31):
I find a positive correlation between the number of numbers > 31 and the prize-per-winner, and a negative correlation between the number of numbers < 32 and the prize-per-winner.
Also, the sum of the numbers is positively correlated with the prize-per-winner, and the number of numbers > 31 is positively correlated with a rollover event, when no-one wins.
What is the best way of approaching this with the data available?