2

I am interested in estimating the expected value of a function, $f(x)$ with respect to a probability density function, $P(x)$.

I am exploring a method that requires I change variables from the underlying distribution $P(x)$ to a uniform distribution $u\sim U[0, 1]$. The idea is to map $P(x)$ into the input of $f(x)$ as $f(\Phi^{-1}(u))$, where $P(x)=u$ and $\Phi(x)=\int_{-\infty}^x P(x')dx'$:

$\int f(x) P(x) dx = \int f(\Phi^{-1}(u))du$

This was straightforward enough when I let $P(x)$ be an uncorrelated normal distribution:

import numpy as np
from scipy.stats import norm

np.random.seed(1) nsamps = 200000

def f(x): return np.sum(x ** 2, 0)

u = np.random.uniform(0, 1, (2, nsamps)) x = np.random.normal(0, 1, (2, nsamps))

print('Expectation from normal samples: ', np.mean(f(x))) print('Expectation from uniform samples: ', np.mean(f(norm.ppf(u))))

Expectation from normal samples: 1.99866994956538 Expectation from uniform samples: 1.9984323166733162

In my real problem, I have estimated $P(x)$ using a kernel density estimation procedure. Even if I can find $\Phi$, I am baffled if it is even possible to estimate $\Phi^{-1}(u)$ in this case.

Is it possible to compute $\Phi^{-1}(u)$ in this case? If so, how cold this be accomplished?

That is my question. I ask it in the context of my challenge problem, estimating $\int f(\Phi^{-1}(u))du$, where $\Phi(x)$ is the cdf of the joint probability density function found using a kernel density estimator.

The remaining is my best attempt so far:

This stackoverflow thread covers the case when $x$ is one dimensional. The solution interpolates the cdf as a function of $x$.

I've tried emulating this solution with multidimensional $x$, but it is problematic to interpolate a two-dimensional output from a one-dimensional input. This is my best attempt so far at approximating the integral using uniform samples, but it does not compute the correct expectation.

import numpy as np
from scipy.stats import norm, gaussian_kde
from scipy.special import ndtr
from scipy.interpolate import interp2d

def f(x): return(np.sum(x ** 2, 0))

create kde

samples = np.random.normal(loc=0, scale=1, size=(2, 1000)) kde = gaussian_kde(samples)

compute cdf

cdf = tuple(ndtr(np.ravel(item - kde.dataset.T) / kde.factor).mean() for item in samples.T)

interpolate x_2 from cdf and x_1

invcdf = interp2d(cdf, samples[0, :], samples[1, :])

though this is my best attempt so far at using uniform samples, it is not very good

print('estiamtion from uniform samples: ', np.mean(f(invcdf(np.random.uniform(0, 1, 1000), 0))))

expectation approximated by sampling KDE

print('estimation from direct samples: ', np.mean(f(kde.resample(10000))))

estiamtion from uniform samples: 1026.192868721676 estimation from direct samples: 2.178433139240008

Edit

From g g's answer below, I think I have put together a coded example that suites my needs:

import numpy as np
from sklearn.datasets  import load_diabetes
from scipy.stats import norm
import matplotlib.pyplot as plt

load data

dat = load_diabetes() train_x = dat.data[:, [4, 5]] train_x -= train_x.mean() train_x /= train_x.std() dimension = train_x.shape[1] data_size = train_x.shape[0]

define length scale of KDE estimate

LSCALE = 0.5

create KDE of data

def kde(x, lscale=1): density = 0 for point in train_x: density += norm.pdf(x[0], loc=point[0], scale=lscale) * norm.pdf(x[1], loc=point[1], scale=lscale) density /= train_x.shape[0] return density

Compute Cholesky factors

C = np.zeros((data_size, dimension, dimension)) for imat in range(data_size): C[imat,:,:] = np.linalg.cholesky(np.cov(train_x.T))

it seems like the covariance should depend on the length scale of the GPs.

So, I multiplied it by the length scale

C *= LSCALE

define phi

cumulative probabilities, here all equal 1/nsample for simplicity

qprob = np.arange(data_size) / data_size

Function Psi doing the transformation

def psi(u):

determine component according to first coordinate

comp = sum(qprob < u[0]) - 1

determine normal according to the remaining coordinates

Z = norm.ppf(u[1:]) return(train_x[comp, :] + C[comp,:,:]@Z)

draw samples from U]0,1[ to sample space using phi

np.random.seed(10) u_samps = np.random.uniform(0, 1, (dimension + 1, 100)) generated_samps = np.array([psi(u) for u in u_samps.T])

plot results

if True: NX = 16 plot_x = np.linspace(generated_samps.min(0)[0], generated_samps.max(0)[0], NX) plot_y = np.linspace(generated_samps.min(0)[1], generated_samps.max(0)[1], NX) X, Y = np.meshgrid(plot_x, plot_y) plot_points = np.array([X, Y]).reshape(2, -1) dens = np.array([kde(p, LSCALE) for p in plot_points.T]) plt.contourf(X, Y, dens.reshape(NX, NX), 1000) plt.scatter(train_x[:, 0], train_x[:, 1], c='k', s=10, label='Observed') plt.scatter(generated_samps[:, 0], generated_samps[:, 1], c='r', s=10, label='Generated') plt.xlim(plot_x.min(), plot_x.max()) plt.ylim(plot_y.min(), plot_y.max()) plt.legend() plt.savefig('density') plt.clf()

enter image description here

np.mean(f(train_x.T))
  1.1494072502058117

np.mean(f(generated_samps.T)) 1.105391993079355

  • Is it at all possible to perform the integral with the change of variables for any joint probability distributions of $x$? This answer indicated it is possible for correlated Gaussian inputs. https://stats.stackexchange.com/questions/572787/bayesian-quadrature-to-find-expectation-of-unkown-function-w-r-t-known-pdf/572800#572800 – kilojoules Apr 25 '22 at 07:21
  • 1
  • This comment refers to this answer: Watch out! It is not P, i.e. the density, which you need to invert but another function $\Phi$. This may explain the issue you have with the inversion. Note that $\Phi$ is not a CDF as it does not map to $]0,1[.$

  • I could not find a sufficiently detailed description of "gaussian_kde" in the SciPy docs. But it is most likely not simply a correlated Gaussian but something more complicated such as a Gaussian mixture.

  • – g g Apr 25 '22 at 07:41
  • Thanks for this feedback all. I reworded the question to be more careful to use $\Phi$ instead of $P$ when appropriate. Say I have $\Phi$: how can I then compute $\Phi^{-1}$? – kilojoules Apr 25 '22 at 07:43
  • As I understand the OP he wants to use a library for integration with respect to the uniform measure. The measure he is interested in is not uniform. So I suggested to pull back his measure to the uniform measure and perform the integral with the pull back. – g g Apr 25 '22 at 07:44
  • 1
    The practical issue is now, whether it is possible to find such a pull back map explicitely. – g g Apr 25 '22 at 07:46
  • You should change the title of the question. Since you are not looking for a CDF but for "a transformation of a kernel density estimate to uniform" the title as it is is misleading. – g g Apr 25 '22 at 10:23
  • Thanks! I changed the title. – kilojoules Apr 25 '22 at 19:10
  • 1
    Because the KDE is a mixture of kernels, https://stats.stackexchange.com/questions/411647 answers your question. There's no explicit (closed form analytical) solution, so the key is to implement an efficient numerical procedure. That means deploying a suitable root finder. cc @gg – whuber Apr 25 '22 at 20:05
  • @whuber I do not see how the linked answer applies to this question. The OP neither wants to compute the quantiles nor the likelihood. He "just" wants to compute the expectation in a certain way (due to his restrictions on numerical libraries). While the expectation of a mixture is straightforward to compute in terms of the components this will contradict his goal of having as few evaluations of the integrand as possible. – g g Apr 25 '22 at 20:42
  • 1
    @gg I was responding to the emphasized question in the middle of the post, "Is it possible to compute $\Phi^{-1}(u)$ in this case?" I have understood the introductory material about computing expectations as being background and motivating material, utilizing the OP's typographic emphasis to determine the ultimate question. After all, when the kernel has a mean of zero, the expectation of the KDE is the arithmetic mean of the data. – whuber Apr 25 '22 at 20:47
  • 1
    Re the motivating question: why not use a different method to estimate the expectation? Your approach appears to be a very complicated way to compute the arithmetic mean of your data! – whuber Apr 26 '22 at 12:57