random number generation of truncated multivariate normal distribution

Question

I want to generate random numbers from truncated multivariate normal distribution specified as follows:

$ \begin{bmatrix} Y \\ X \end{bmatrix} \sim N \begin{pmatrix} \begin{bmatrix} \mu_Y \\ \mu_X \end{bmatrix}, \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{bmatrix} \end{pmatrix} $ where $-1<Y<1$ and $0< X < 1$.

I have tried R command "rtmvnorm". By looking at the generated numbers, they seem to satisfy $-1<Y<1$ and $0<X < 1$ and their means are close to the theoretical values. However, checking the variances and covariance, they are much smaller than the theoretical values. Below is the code and output.

library(TruncatedNormal)
n <- 500000
mu <- c(0, 0.5)
sigma <- matrix(c(1, 0.3, 0.3, 1), 2,2)
lb <- c(-1, 0)
ub <- c( 1, 1)
X <- rtmvnorm(n, mu, sigma, lb, ub)
head(X,10)
colMeans(X)
cov(X)

Output is as follows:

> head(X,10)
             [,1]       [,2]
 [1,]  0.25065762 0.86349926
 [2,] -0.60200807 0.28179668
 [3,]  0.07264893 0.03382819
 [4,]  0.48001486 0.83693168
 [5,]  0.19537872 0.34182959
 [6,]  0.11081347 0.96187636
 [7,]  0.98743046 0.85316679
 [8,] -0.92329383 0.52252609
 [9,]  0.10285388 0.33629839
[10,] -0.31047567 0.75213528
> colMeans(X)
[1] -0.0008484419  0.5002994220
> cov(X)
            [,1]        [,2]
[1,] 0.287918806 0.007407156
[2,] 0.007407156 0.080514144

I'm wondering if it is possible to adjust so that empirical (co)variances are close to the theoretical value.

Are you comparing with the covariance of the untruncated Normal or with the covariance of the truncated Normal? They are not the same. — Xi'an, Feb 15 '23 at 11:13
The parameters of the truncated distribution regard the parameters of the distribution before the truncation, see e.g. https://stats.stackexchange.com/questions/226662/truncated-normal-distribution-theoretical-mean-outside-truncation-boundaries/226666#226666 , this is "by design". — Tim, Feb 15 '23 at 11:17
In other words, when you enter mu and sigma in the code, they are not the mean and covariance of the simulated random variable. — Xi'an, Feb 15 '23 at 11:23
Related: https://stats.stackexchange.com/questions/157963. (The truncation there is radial rather than Cartesian.) — whuber, Feb 15 '23 at 15:15

statmerkur · Answer 1 · 2023-02-20T12:34:56.543

In general, $\begin{bmatrix} \mu_Y \\ \mu_X \end{bmatrix}$ and $\begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{bmatrix}$ are not the expectation and covariance matrix of your truncated bivariate normal distribution.

One way to calculate these is given by Manjunath and Wilhelm (2021) and implemented in the mtmvnorm function of the tmvtnorm package by the same authors.

They derived the expected value and covariance matrix of a truncated $d$–variate normal random variable $\mathbf X = \left(X_1, \ldots, X_d\right)^\top$, component-wise truncated from below at $\mathbf a = \left(a_1, \ldots, a_d\right)^\top$ and from above at $\mathbf b = \left(b_1, \ldots, b_d\right)^\top$, by partially differentiating the corresponding moment generating function $$ \mathop{M_{\mathbf X}}\left(\mathbf t\right) = \frac{\left|2\pi\boldsymbol\Sigma\right|^{-1/2}}{\mathop{\mathbb P}\left(\mathbf a \leq \mathbf X \leq \mathbf b\right)} \int_{\mathbf a}^{\mathbf b} \exp\left(-\frac{1}{2} \left[\left(\mathbf x - \boldsymbol \mu\right)^\top {\boldsymbol \Sigma}^{-1} \left(\mathbf x - \boldsymbol \mu\right) - 2 \cdot \mathbf t^\top\mathbf x\right] \right) \mathrm d \mathbf x, $$ where $\leq$ denotes component-wise inequality, $\mathbf t = \left(t_1, \ldots, t_d\right)^\top$, and $\boldsymbol \mu$ and $\boldsymbol \Sigma$ are the expected value and (non-singular) covariance matrix of the underlying untruncated $d$–variate normal random variable.
This yields

\begin{align} \mathop{\mathbb E}\left(X_i\right) &= \frac{\partial \mathop{M_{\mathbf X}}\left(\mathbf t\right)}{\partial t_i}\bigg|_{\mathbf t = \mathbf 0} \\ &= \mu_i + \sum_{k=1}^d \sigma_{i,k} \left(\mathop{f_{X_k}}\left(a_k\right) - \mathop{f_{X_k}}\left(b_k\right)\right), \end{align}

and

\begin{align} \mathop{\mathbb E}\left(X_iX_j\right) &= \frac{\partial^2 \mathop{M_{\mathbf X}}\left(\mathbf t\right)}{\partial t_i\partial t_j}\bigg|_{\mathbf t = \mathbf 0}\\ &= \sigma_{i,j} + \sum_{k=1}^d \sigma_{i,k} \frac{\sigma_{j,k}\left(a_k\mathop{f_{X_k}}\left(a_k\right)-b_k\mathop{f_{X_k}}\left(b_k\right)\right)}{\sigma_{k,k}} \\ &\quad + \sum_{k=1}^d \sigma_{i,k} \sum_{q \neq k} \left(\sigma_{j, q} - \frac{\sigma_{k,q}\sigma_{j,k}}{\sigma_{k,k}}\right)\left[\left(\mathop{f_{X_k,X_q}}\left(a_k,a_q\right) - \mathop{f_{X_k,X_q}}\left(a_k,b_q\right)\right) \\ - \left(\mathop{f_{X_k,X_q}}\left(b_k,a_q\right) - \mathop{f_{X_k,X_q}}\left(b_k,b_q\right)\right)\right], \end{align} which also determines the entries the covariance matrix of $\mathbf X$ via $\mathop{\mathrm{Cov}}\left(X_i,X_j\right) = \mathop{\mathbb E}\left(X_iX_j\right) - \mathop{\mathbb E}\left(X_i\right) \mathop{\mathbb E}\left(X_j\right)$.

That reduces the problem to evaluating $\mathop{f_{X_k}}$ and $\mathop{f_{X_k,X_q}}$, the marginal density of $X_k$ and $\left(X_k,X_q\right)$, respectively, at specific points.
The former can be achieved with the algorithm given in Cartinhour (1990). The latter turns out to be the product of $1/{\mathop{\mathbb P}\left(\mathbf a \leq \mathbf X \leq \mathbf b\right)}$, a bivariate normal density (evaluated at those points), and the multivariate normal probability obtained by integrating a $(d-2)$-variate normal density over a hyperrectangle.

For your problem

library(TruncatedNormal)
set.seed(42)
n <- 500000
mu <- c(0, 0.5)
sigma <- matrix(c(1, 0.3, 0.3, 1), 2, 2)
lb <- c(-1, 0)
ub <- c( 1, 1)
X <- rtmvnorm(n, mu, sigma, lb, ub)
colMeans(X)
[1] 0.0007696717 0.5002392194
cov(X)
[,1]        [,2]
[1,] 0.287768996 0.007486595
[2,] 0.007486595 0.080491050

their code gives

# devtools::install_github("stefanwilhelm/tmvtnorm")
tmvtnorm::mtmvnorm(mean = mu, 
                   sigma = sigma, 
                   lower = lb, 
                   upper = ub)
# $tmean
# [1] 0.0 0.5
# 
# $tvar
#             [,1]        [,2]
# [1,] 0.287549082 0.007606965
# [2,] 0.007606965 0.080405828

References

Cartinhour, J. (1990). One-dimensional marginal density functions of a truncated multivariate normal density function. Communications in Statistics - Theory and Methods, 19(1), 197–203.
Manjunath, B. G., & Wilhelm, S. (2021). Moments calculation for the doubly truncated multivariate normal density. Journal of Behavioral Data Science, 1(1), 13-33.

Short summary of the results presented in the paper would be useful to make the answer self-contained. — Tim, Feb 15 '23 at 12:18
Many thanks for comments. I understand that the means and (co)variances will differ after truncation. — user0131, Feb 16 '23 at 00:17
@Tim I hope my edit makes the answer sufficiently self-contained. — statmerkur, Feb 19 '23 at 22:42

random number generation of truncated multivariate normal distribution

1 Answers1

[1] 0.0007696717 0.5002392194

[,1] [,2]

[1,] 0.287768996 0.007486595

[2,] 0.007486595 0.080491050