Is kernel density in kernel density estimation derived or defined?

Question

Is kernel density in kernel density estimation derived or defined? If defined, why is it defined this way, if derived, how to derive it? In particular, why $h^d$ and not $h$ in the multivariate case, where kernel density is defined as $$\frac{1}{nh^d}\sum_{i=1}^n K(\frac{x-x_i}{h})$$

Could you please clarify what you mean by "derive" and "define"? Your second question is answered by considering that the integral of the kernel density must be $1$. — whuber, Sep 26 '17 at 17:01
Derive meaning it can be gotten from a base formula and expand to become this. Define meaning someone thinks it is a good idea for density to look this way so it cannot be explained. — user10024395, Sep 27 '17 at 01:40
@user2675516 what "base formula"? Given such definition, all of the statistics, probability theory and mathematics are "defined" since they all are based on some definitions, assumptions and abstract models... — Tim, Mar 01 '18 at 08:26

Ben · Answer 1 · 2022-07-04T08:15:02.310

The form for the KDE is essentially stipulated by definition, but the scaling constant you're looking at comes out as a result of the properties of the "kernels" in the definition of the function. For the case where you have an observable random vector $\mathbf{x} = (x_1,...,x_d)$ we can define the kernel density estimator (KDE) by the proportionality requirement:

$$\hat{f}(\mathbf{x}|h) \propto \sum_{i=1}^n K(\mathbf{r}_{i,h}) \quad \quad \quad \quad \quad \mathbf{r}_{i,h} \equiv \frac{\mathbf{x}-\mathbf{x}_i}{h},$$

where $h>0$ is the bandwidth parameter for the KDE. The kernel function $K$ is usually stipulated to be a density function with zero expectation. Since it is a density function, we have:

$$\int \limits_{\mathbb{R}^d} K(\mathbf{r}) \ d\mathbf{r} = 1.$$

Consequently, using the change of variable $\mathbf{r}=(\mathbf{x}-\mathbf{x}_i)/h$ we have $h^d \ d\mathbf{r} = d\mathbf{x}$ so we get:

$$\int \limits_{\mathbb{R}^d} K \bigg( \frac{\mathbf{x}-\mathbf{x}_i}{h} \bigg) \ d \mathbf{x} = h^d.$$

This means that:

$$\begin{align} \int \limits_{\mathbb{R}^d} \sum_{i=1}^n K \bigg( \frac{\mathbf{x}-\mathbf{x}_i}{h} \bigg) \ d\mathbf{x} &= \sum_{i=1}^n \int \limits_{\mathbb{R}^d} K \bigg( \frac{\mathbf{x}-\mathbf{x}_i}{h} \bigg) \ d\mathbf{x} \\[6pt] &= \sum_{i=1}^n h^d \\[14pt] &= n h^d. \\[6pt] \end{align}$$

Scaling this result to integrate to one then gives the KDE of the specified form you are using.

score 2 · Answer 2 · answered Mar 01 '18 at 07:58

You can "derive" it from the empirical distribution function $$ F_n(x) = \frac{1}{n} \sum_{i=1}^{n}\mathbf{1}_{\{X_i \leq x\}}. $$ Since the density function at a point x of its support is just defined as the derivative of the cumulative distribution function, a straightforward estimator of the density function is $$ \hat{f}_{n}(x) = \frac{F_n(x + h_n) - F_n(x - h_n)}{2h_n} = \frac{1}{2nh_n}\mathbf{1}_{\{x-h_n \leq X_i \leq x+h_n\}}. $$ This kernel estimator would give uniform weight $\frac{1}{2}$ to each observation in the window $(x-h_n,x+h_n)$. This motivates the estimation of the nonparametric density with a smoothed kernel function: $$ \hat{f}_n(x) = \frac{1}{nh_n}\sum_{i=1}^{n} K \left( \frac{X_i-x}{h_n} \right). $$

score 0 · Answer 3 · answered Jun 15 '22 at 06:43

It is a statistical model, all statistical models are made up, they all are wrong, but useful. It is a model, so some kind of abstraction that represents something else, so something defined. Like all the statistical models, it is based on some assumptions about the data, and in this sense, it is derived from those assumptions, as noticed by stollenm in the other answer.

Derive meaning it can be gotten from a base formula and expand to become this. Define meaning someone thinks it is a good idea for density to look this way so it cannot be explained.

There is no clear answer to your question because it is hard to put a hard boundary between "derived" and "defined" unless you would limit the usage of "defined" to axioms, in such a case the answer would be obvious. Every model, even the most black boxish one like $k$-NN, has some math to back it up, so using your definition to some degree is "derived". In all the cases, the models are just mathematical abstractions made up of assumptions mixed with clever ideas.

That quotation is nearly gibberish. Perhaps you could paraphrase it to make better sense? — whuber, Jun 15 '22 at 14:10
Not sure what you mean @whuber, this is OP's words I'm referring to. — Tim, Jun 15 '22 at 14:44
Right. But they make almost no sense as written because the English is garbled. — whuber, Jun 15 '22 at 15:56

Is kernel density in kernel density estimation derived or defined?

3 Answers3