By better I mean fewer density error against the true PDF. Say that $X$ is the random variable that we wish to find its true PDF $f_X$ by the estimation $\hat f_X$. Then my goal is to find $\hat f_X$ that minimizes:
$$ \mathbb{E}[(\hat f_X(X) - f_X(X))^2] $$
Now that I have defined better, my question is:
- We have a random variable $X$ that takes values in $\mathbb{R}^n$.
- We have a finite set of samples $\mathcal{L} = \{x_1,x_2,\ldots,x_k\}$ that contains some outcomes of the variable $X$ as obtained in the past (think of $\mathcal{L}$ as a learning set).
- Question: what is the best method to find $f_X$ by empirically analyzing the samples in $\mathcal{L}$. I.e. we must not assume any known kernel (I guess this makes it non-parametric).
Permitted assumptions:
- Samples in $\mathcal{L}$ are randomly and uniformly chosen from the population with the PDF $f_X$.
My thought train:
I can empirically measure the CDF from $\mathcal{L}$, which will have some stairs. Then, I smoothen this CDF so that, when I differentiate it, I get a continuous PDF. I find this easy to understand and see that my assumptions are confined in the interpolation/smoothing of the stairs in the CDF.
However, when I try to find the PDF directly, I just can't think of a method to estimate it non-parametrically by looking at samples in $\mathcal{L}$. I can only think of estimating the PDF parametrically after assuming certain PDF.
To be honest, I am not even sure why getting non-parametric PDFs from CDFs looks approachable, while getting non-parametric PDFs directly isn't. But this is exactly why I am asking this question. Why is this?
- Are my feelings due to some facts that I am not able to explicitly explain?
- Or are my feelings so only due to the randomness of evolution, and that scientifically none of the methods are actually any better than the other.