In their classic book on the Federalist papers, Mosteller and Wallace argue for a log penalty function: you penalize yourself $-\log(p)$ when you predict an event with probability $p$ and it occurs; the penalty for it not occurring equals $-\log(1-p)$. Thus, the penalty is high when whatever happens is unexpected according to your prediction.
Their argument in favor of this function rests on a simple natural criterion: "the penalty function should encourage the prediction of the correct probabilities if they are known." Assuming the total penalty is summed over all predictions and there will be three or more of them, M&W claim that the log penalty function is the only one (up to affine transformation) for which the "expected penalty is minimized over all predictions" by the correct probabilities.
Following this, then, a good test for you to use is to track your accumulated log penalties. If, after a long time (or by means of some independent oracle), you obtain accurate estimates of what the probabilities actually were, you can compare your penalty with the minimum possible one. The average of that difference measures your long-run predictive performance (the lower the better). This is an excellent way to compare two or more competing predictors, too.