I have a bivariate normal distribution composed of the univariate normal distributions $X_1$ and $X_2$ with $\rho \approx 0.3$.
$$ \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} , \begin{pmatrix} \sigma^2_1 & \rho \sigma_1 \sigma_2 \\ \rho \sigma_1 \sigma_2 & \sigma^2_2 \end{pmatrix} \right) $$
Is there a simple way to calculate in R the cumulative probability of $X_1$ being less than a value $z$ given a particular slice of $X_2$ (between two values $a,b$) given we know all the parameters $\mu_1, \mu_2, \sigma_1, \sigma_2, \rho$?
$P(X_1 < z | a < X_2 < b)$
Can the distribution function I am looking for match (or be approximated by) the distribution function of a univariate normal distribution (to use qnorm/pnorm)? Ideally this would be the case so I can perform the calculation with less dependencies on libraries (e.g. on a MySQL server).
This is the bivariate distribution I am using:
means <- c(79.55920, 52.29355)
variances <- c(268.8986, 770.0212)
rho <- 0.2821711
covariancePartOfMatrix <- sqrt(variances[1]) * sqrt(variances[2]) * rho
sigmaMatrix <- matrix(c(variances[1],covariancePartOfMatrix,covariancePartOfMatrix,variances[2]), byrow=T, ncol=2)
n <- 10000
dat <- MASS::mvrnorm(n=n, mu=means, Sigma=sigmaMatrix)
plot(dat)
This is my numerical attempt to get the correct result. However it uses generated data from the bivariate distribution and I'm not convinced it will give the correct result.
a <- 79.5
b <- 80.5
z <- 50
sliceOfDat <- subset(data.frame(dat), X1 > a, X1 < b)
estimatedMean <- mean(sliceOfDat[,c(2)])
estimatedDev <- sd(sliceOfDat[,c(2)])
estimatedPercentile <- pnorm(z, estimatedMean, estimatedDev)
Edit - R implementation of solution based on whuber's answer
Here is an implementation of the accepted solution using integrate, compared against my original idea based on sampling. The accepted solution provides the expected output 0.5, whereas my original idea deviated by a significant amount (0.41). Update - See wheber's edit for a better implementation.
# Bivariate distribution parameters
means <- c(79.55920, 52.29355)
variances <- c(268.8986, 770.0212)
rho <- 0.2821711
# Generate sample data for bivariate distribution
n <- 10000
covariancePartOfMatrix <- sqrt(variances[1]) * sqrt(variances[2]) * rho
sigmaMatrix <- matrix(c(variances[1],covariancePartOfMatrix,covariancePartOfMatrix,variances[2]), byrow=T, ncol=2)
dat <- MASS::mvrnorm(n=n, mu=means, Sigma=sigmaMatrix)
# Input parameters to test the estimation
w = 79.55920
a <- w - 0.5
b <- w + 0.5
z <- 52.29355
# Univariate approximation using randomness
sliceOfDat <- subset(data.frame(dat), X1 > a, X1 < b)
estimatedMean <- mean(sliceOfDat[,c(2)])
estimatedDev <- sd(sliceOfDat[,c(2)])
estimatedPercentile <- pnorm(z, estimatedMean, estimatedDev)
# OUTPUT: 0.411
# Numerical approximation from exact solution
adaptedZ <- (z - means[2]) / sqrt(variances[2])
adaptedB <- (b - means[1]) / sqrt(variances[1])
adaptedA <- (a - means[1]) / sqrt(variances[1])
exactSolutionCoeff <- 1 / (pnorm(adaptedB) - pnorm(adaptedA))
integrand <- function(x) pnorm((adaptedZ - rho * x) / sqrt(1 - rho * rho)) * dnorm(x)
exactSolutionInteg <- integrate(integrand, adaptedA, adaptedB)
# 0.0121, abs.error 1.348036e-16, "OK"
exactPercentile = exactSolutionCoeff * exactSolutionInteg$value
# OUTPUT: 0.500

integrateto make sure it's not running into accuracy problems. – whuber May 20 '16 at 20:20