I am interested in estimating the expected value of a function, $f(x)$ with respect to a probability density function, $P(x)$.
I am exploring a method that requires I change variables from the underlying distribution $P(x)$ to a uniform distribution $u\sim U[0, 1]$. The idea is to map $P(x)$ into the input of $f(x)$ as $f(\Phi^{-1}(u))$, where $P(x)=u$ and $\Phi(x)=\int_{-\infty}^x P(x')dx'$:
$\int f(x) P(x) dx = \int f(\Phi^{-1}(u))du$
This was straightforward enough when I let $P(x)$ be an uncorrelated normal distribution:
import numpy as np
from scipy.stats import norm
np.random.seed(1)
nsamps = 200000
def f(x): return np.sum(x ** 2, 0)
u = np.random.uniform(0, 1, (2, nsamps))
x = np.random.normal(0, 1, (2, nsamps))
print('Expectation from normal samples: ', np.mean(f(x)))
print('Expectation from uniform samples: ', np.mean(f(norm.ppf(u))))
Expectation from normal samples: 1.99866994956538
Expectation from uniform samples: 1.9984323166733162
In my real problem, I have estimated $P(x)$ using a kernel density estimation procedure. Even if I can find $\Phi$, I am baffled if it is even possible to estimate $\Phi^{-1}(u)$ in this case.
Is it possible to compute $\Phi^{-1}(u)$ in this case? If so, how cold this be accomplished?
That is my question. I ask it in the context of my challenge problem, estimating $\int f(\Phi^{-1}(u))du$, where $\Phi(x)$ is the cdf of the joint probability density function found using a kernel density estimator.
The remaining is my best attempt so far:
This stackoverflow thread covers the case when $x$ is one dimensional. The solution interpolates the cdf as a function of $x$.
I've tried emulating this solution with multidimensional $x$, but it is problematic to interpolate a two-dimensional output from a one-dimensional input. This is my best attempt so far at approximating the integral using uniform samples, but it does not compute the correct expectation.
import numpy as np
from scipy.stats import norm, gaussian_kde
from scipy.special import ndtr
from scipy.interpolate import interp2d
def f(x): return(np.sum(x ** 2, 0))
create kde
samples = np.random.normal(loc=0, scale=1, size=(2, 1000))
kde = gaussian_kde(samples)
compute cdf
cdf = tuple(ndtr(np.ravel(item - kde.dataset.T) / kde.factor).mean()
for item in samples.T)
interpolate x_2 from cdf and x_1
invcdf = interp2d(cdf, samples[0, :], samples[1, :])
though this is my best attempt so far at using uniform samples, it is not very good
print('estiamtion from uniform samples: ', np.mean(f(invcdf(np.random.uniform(0, 1, 1000), 0))))
expectation approximated by sampling KDE
print('estimation from direct samples: ', np.mean(f(kde.resample(10000))))
estiamtion from uniform samples: 1026.192868721676
estimation from direct samples: 2.178433139240008
Edit
From g g's answer below, I think I have put together a coded example that suites my needs:
import numpy as np
from sklearn.datasets import load_diabetes
from scipy.stats import norm
import matplotlib.pyplot as plt
load data
dat = load_diabetes()
train_x = dat.data[:, [4, 5]]
train_x -= train_x.mean()
train_x /= train_x.std()
dimension = train_x.shape[1]
data_size = train_x.shape[0]
define length scale of KDE estimate
LSCALE = 0.5
create KDE of data
def kde(x, lscale=1):
density = 0
for point in train_x:
density += norm.pdf(x[0], loc=point[0], scale=lscale) * norm.pdf(x[1], loc=point[1], scale=lscale)
density /= train_x.shape[0]
return density
Compute Cholesky factors
C = np.zeros((data_size, dimension, dimension))
for imat in range(data_size):
C[imat,:,:] = np.linalg.cholesky(np.cov(train_x.T))
it seems like the covariance should depend on the length scale of the GPs.
So, I multiplied it by the length scale
C *= LSCALE
define phi
cumulative probabilities, here all equal 1/nsample for simplicity
qprob = np.arange(data_size) / data_size
Function Psi doing the transformation
def psi(u):
determine component according to first coordinate
comp = sum(qprob < u[0]) - 1
determine normal according to the remaining coordinates
Z = norm.ppf(u[1:])
return(train_x[comp, :] + C[comp,:,:]@Z)
draw samples from U]0,1[ to sample space using phi
np.random.seed(10)
u_samps = np.random.uniform(0, 1, (dimension + 1, 100))
generated_samps = np.array([psi(u) for u in u_samps.T])
plot results
if True:
NX = 16
plot_x = np.linspace(generated_samps.min(0)[0], generated_samps.max(0)[0], NX)
plot_y = np.linspace(generated_samps.min(0)[1], generated_samps.max(0)[1], NX)
X, Y = np.meshgrid(plot_x, plot_y)
plot_points = np.array([X, Y]).reshape(2, -1)
dens = np.array([kde(p, LSCALE) for p in plot_points.T])
plt.contourf(X, Y, dens.reshape(NX, NX), 1000)
plt.scatter(train_x[:, 0], train_x[:, 1], c='k', s=10, label='Observed')
plt.scatter(generated_samps[:, 0], generated_samps[:, 1], c='r', s=10, label='Generated')
plt.xlim(plot_x.min(), plot_x.max())
plt.ylim(plot_y.min(), plot_y.max())
plt.legend()
plt.savefig('density')
plt.clf()
np.mean(f(train_x.T))
1.1494072502058117
np.mean(f(generated_samps.T))
1.105391993079355

This comment refers to this answer: Watch out! It is not P, i.e. the density, which you need to invert but another function $\Phi$. This may explain the issue you have with the inversion. Note that $\Phi$ is not a CDF as it does not map to $]0,1[.$
I could not find a sufficiently detailed description of "gaussian_kde" in the SciPy docs. But it is most likely not simply a correlated Gaussian but something more complicated such as a Gaussian mixture.