5

I want to generate random numbers from truncated multivariate normal distribution specified as follows:

$ \begin{bmatrix} Y \\ X \end{bmatrix} \sim N \begin{pmatrix} \begin{bmatrix} \mu_Y \\ \mu_X \end{bmatrix}, \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{bmatrix} \end{pmatrix} $ where $-1<Y<1$ and $0< X < 1$.

I have tried R command "rtmvnorm". By looking at the generated numbers, they seem to satisfy $-1<Y<1$ and $0<X < 1$ and their means are close to the theoretical values. However, checking the variances and covariance, they are much smaller than the theoretical values. Below is the code and output.

library(TruncatedNormal)
n <- 500000
mu <- c(0, 0.5)
sigma <- matrix(c(1, 0.3, 0.3, 1), 2,2)

lb <- c(-1, 0) ub <- c( 1, 1) X <- rtmvnorm(n, mu, sigma, lb, ub) head(X,10) colMeans(X) cov(X)

Output is as follows:

> head(X,10)
             [,1]       [,2]
 [1,]  0.25065762 0.86349926
 [2,] -0.60200807 0.28179668
 [3,]  0.07264893 0.03382819
 [4,]  0.48001486 0.83693168
 [5,]  0.19537872 0.34182959
 [6,]  0.11081347 0.96187636
 [7,]  0.98743046 0.85316679
 [8,] -0.92329383 0.52252609
 [9,]  0.10285388 0.33629839
[10,] -0.31047567 0.75213528
> colMeans(X)
[1] -0.0008484419  0.5002994220
> cov(X)
            [,1]        [,2]
[1,] 0.287918806 0.007407156
[2,] 0.007407156 0.080514144

I'm wondering if it is possible to adjust so that empirical (co)variances are close to the theoretical value.

statmerkur
  • 5,950
user0131
  • 357
  • 4
    Are you comparing with the covariance of the untruncated Normal or with the covariance of the truncated Normal? They are not the same. – Xi'an Feb 15 '23 at 11:13
  • 4
    The parameters of the truncated distribution regard the parameters of the distribution before the truncation, see e.g. https://stats.stackexchange.com/questions/226662/truncated-normal-distribution-theoretical-mean-outside-truncation-boundaries/226666#226666 , this is "by design". – Tim Feb 15 '23 at 11:17
  • 1
    In other words, when you enter mu and sigma in the code, they are not the mean and covariance of the simulated random variable. – Xi'an Feb 15 '23 at 11:23
  • Related: https://stats.stackexchange.com/questions/157963. (The truncation there is radial rather than Cartesian.) – whuber Feb 15 '23 at 15:15

1 Answers1

11

In general, $\begin{bmatrix} \mu_Y \\ \mu_X \end{bmatrix}$ and $\begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{bmatrix}$ are not the expectation and covariance matrix of your truncated bivariate normal distribution.

One way to calculate these is given by Manjunath and Wilhelm (2021) and implemented in the mtmvnorm function of the tmvtnorm package by the same authors.

They derived the expected value and covariance matrix of a truncated $d$–variate normal random variable $\mathbf X = \left(X_1, \ldots, X_d\right)^\top$, component-wise truncated from below at $\mathbf a = \left(a_1, \ldots, a_d\right)^\top$ and from above at $\mathbf b = \left(b_1, \ldots, b_d\right)^\top$, by partially differentiating the corresponding moment generating function $$ \mathop{M_{\mathbf X}}\left(\mathbf t\right) = \frac{\left|2\pi\boldsymbol\Sigma\right|^{-1/2}}{\mathop{\mathbb P}\left(\mathbf a \leq \mathbf X \leq \mathbf b\right)} \int_{\mathbf a}^{\mathbf b} \exp\left(-\frac{1}{2} \left[\left(\mathbf x - \boldsymbol \mu\right)^\top {\boldsymbol \Sigma}^{-1} \left(\mathbf x - \boldsymbol \mu\right) - 2 \cdot \mathbf t^\top\mathbf x\right] \right) \mathrm d \mathbf x, $$ where $\leq$ denotes component-wise inequality, $\mathbf t = \left(t_1, \ldots, t_d\right)^\top$, and $\boldsymbol \mu$ and $\boldsymbol \Sigma$ are the expected value and (non-singular) covariance matrix of the underlying untruncated $d$–variate normal random variable.
This yields

\begin{align} \mathop{\mathbb E}\left(X_i\right) &= \frac{\partial \mathop{M_{\mathbf X}}\left(\mathbf t\right)}{\partial t_i}\bigg|_{\mathbf t = \mathbf 0} \\ &= \mu_i + \sum_{k=1}^d \sigma_{i,k} \left(\mathop{f_{X_k}}\left(a_k\right) - \mathop{f_{X_k}}\left(b_k\right)\right), \end{align}

and

\begin{align} \mathop{\mathbb E}\left(X_iX_j\right) &= \frac{\partial^2 \mathop{M_{\mathbf X}}\left(\mathbf t\right)}{\partial t_i\partial t_j}\bigg|_{\mathbf t = \mathbf 0}\\ &= \sigma_{i,j} + \sum_{k=1}^d \sigma_{i,k} \frac{\sigma_{j,k}\left(a_k\mathop{f_{X_k}}\left(a_k\right)-b_k\mathop{f_{X_k}}\left(b_k\right)\right)}{\sigma_{k,k}} \\ &\quad + \sum_{k=1}^d \sigma_{i,k} \sum_{q \neq k} \left(\sigma_{j, q} - \frac{\sigma_{k,q}\sigma_{j,k}}{\sigma_{k,k}}\right)\left[\left(\mathop{f_{X_k,X_q}}\left(a_k,a_q\right) - \mathop{f_{X_k,X_q}}\left(a_k,b_q\right)\right) \\ - \left(\mathop{f_{X_k,X_q}}\left(b_k,a_q\right) - \mathop{f_{X_k,X_q}}\left(b_k,b_q\right)\right)\right], \end{align} which also determines the entries the covariance matrix of $\mathbf X$ via $\mathop{\mathrm{Cov}}\left(X_i,X_j\right) = \mathop{\mathbb E}\left(X_iX_j\right) - \mathop{\mathbb E}\left(X_i\right) \mathop{\mathbb E}\left(X_j\right)$.

That reduces the problem to evaluating $\mathop{f_{X_k}}$ and $\mathop{f_{X_k,X_q}}$, the marginal density of $X_k$ and $\left(X_k,X_q\right)$, respectively, at specific points.
The former can be achieved with the algorithm given in Cartinhour (1990). The latter turns out to be the product of $1/{\mathop{\mathbb P}\left(\mathbf a \leq \mathbf X \leq \mathbf b\right)}$, a bivariate normal density (evaluated at those points), and the multivariate normal probability obtained by integrating a $(d-2)$-variate normal density over a hyperrectangle.

For your problem

library(TruncatedNormal)

set.seed(42)

n <- 500000 mu <- c(0, 0.5) sigma <- matrix(c(1, 0.3, 0.3, 1), 2, 2)

lb <- c(-1, 0) ub <- c( 1, 1) X <- rtmvnorm(n, mu, sigma, lb, ub) colMeans(X)

[1] 0.0007696717 0.5002392194

cov(X)

[,1] [,2]

[1,] 0.287768996 0.007486595

[2,] 0.007486595 0.080491050

their code gives

# devtools::install_github("stefanwilhelm/tmvtnorm")
tmvtnorm::mtmvnorm(mean = mu, 
                   sigma = sigma, 
                   lower = lb, 
                   upper = ub)
# $tmean
# [1] 0.0 0.5
# 
# $tvar
#             [,1]        [,2]
# [1,] 0.287549082 0.007606965
# [2,] 0.007606965 0.080405828

References

statmerkur
  • 5,950