Estimating an unknown distribution from a mixture

Question

I have two data sets, $\{x_i\}$ and $\{y_i\}$. I know that data set $\{x_i\}$ was sampled from some distribution $X$, and that data set $\{y_i\}$ is sampled from a mixture of the $X$, and some other unknown distribution $Y$. I am wanting to estimate what the mixing ratio is/to know how many of the samples in $\{y_i\}$ come from $X$.

If I make some assumptions about $Y$ (such as it being normal) this is just a simple mixture model problem, but ideally I don't want to do this. I'm wondering if there is some approach to this problem, or if it isn't possible.

One idea that I had was to have a bunch of kernels (evenly spaced normal distributions with known $\sigma$), and use MLE to find their mixing ratios, but I assume doing so would just set the mixing ratio for $X$ to be zero, and just give me the KDE. Perhaps there is some way of penalising this, but my only thought was to set a prior on what I thought the mixing ratio of $X$ was, which I would rather avoid.

If it is possible to solve this problem for categorical mixture models, than I can just bin my data, but I couldn't find a way of solving this problem in a categorical sense either, or really anything to do with parameter estimates for categorical mixture models (which makes sense because the sample distribution would have the maximum likelihood)

If you only have nonparametric density estimates for the two datasets (and stick to Frequentist methods), then I think there's an identifiability issue. If you know the mixing proportion $\alpha$, then one could estimate the density of $Y$ with $\hat{f}Y(y)=(\hat{f}{XY}(y)-\alpha \hat{f}_X(y))/(1-\alpha)$. But I think you can't get there from here if you don't know both the mixing proportion and the distribution of $Y$. Maybe your Bayesian suggestion might have promise. — JimB, Feb 07 '24 at 18:27
After reading my comment again, I don't think my next-to-last sentence was very clear. What I meant was that either the mixing proportion or the distribution of $Y$ would be need to be known to estimate the other. If both were unknown, then that's where the identifiability issue comes into play. — JimB, Feb 08 '24 at 05:06

Estimating an unknown distribution from a mixture

0 Answers0