I have a dataset containing some hundreds of thousands of observations, out of which some small number contain an event of interest x. Let's say that my total dataset is large enough that I have a decent confidence in the overall frequency of x.
But what I'm really interested in is the frequency of x together with some other condition y, and the frequency of y in the dataset is much lower. The total number of observations of y doesn't give me enough data to make a confident prediction about how well it correlates with x, and the actual number of observations of x+y is often zero, even though the theoretical frequency of x+y must be something larger than zero.
So how can I estimate the true probability of x+y, given the overall frequency of x in the data set and the small-ish number of instances of y that I have?
Edit: I know that x and y are not independent, but at the outset I don't know anything about the nature of the relationship between them. The entire point of the exercise is to determine whether they have a positive or negative correlation.
Sorry, I know next to nothing about statistics and I don't know what the proper terminology is to describe this situation.