5

From Wikipedia, Gaussian Copula,

it states that a Gaussian Copula can be defined as:

$$ C_P(u_1, \ldots, u_d) = \boldsymbol{\Phi}_P(\Phi^{-1}(u_1), \ldots, \Phi^{-1}(u_d)), $$

where $\boldsymbol{\Phi}_P$ is the joint cumulative distribution function of a multivariate normal distribution with mean vector zero and covariance matrix equal to the correlation matrix $P$ and $\Phi^{-1}$ is the inverse cumulative distribution function of a standard normal.

HOWEVER, in theory when we want to sample values from a Gaussian Copula, we can simulate from the multivariate standard normal distribution with the correlation matrix $P$, and then convert each margin using the probability integral transform with the standard normal distribution function. In that respect, it appears that for simulation, we are doing something like:

$$ C_P(u_1, \ldots, u_d) = \boldsymbol{\Phi}_P(\Phi(u_1), \ldots, \Phi(u_d)), $$

instead.

Could someone tell me how the simulation procedure matches up with the formula on top instead of the formula in the bottom? Additionally, how can we interpret values $u_1, \ldots, u_d$? Are they just any values from the support of a standard normal?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
user321627
  • 4,474

2 Answers2

4

Note that $u_j$ is marginally uniform(0,1). Thus, plugging $u_j$ in $\Phi^{-1}$ gives a marginally Normal(0,1) distribution by the basic properties of inverse CDFs.

With that in mind, we can think of the Gaussian copula as saying "the relation between the values of a random variable pushed through it's marginal CDF for each variable is the same as if the data first came from a multivariate normal and then was converted to marginally uniform random variables by pushing through the marginal normal cdfs". In that case, using $\Phi^{-1}$ is just pushing the data back to the original form.

It might help illustrate the process in which you can simulate data from a Gaussian copula. The steps would be:

1.) Select your correlation matrix $\Sigma$ (i.e. diagonal should be all 1's). This is the correlation matrix for your copula values.

2.) Select your marginal distributions and corresponding CDF's $F^{-1}_j$ for each variable.

3.) Simulate $Z \sim MVN(0, \Sigma)$

4.) Compute $u_j = \Phi(z_j)$. Note that the marginal distribution of $u_j$ is uniform(0,1), but $u_j$ is not independent of $u_{j'}$ (unless the correlation between $z_j$ and $z_{j'}$ is 0).

5.) Compute $x_j = F^{-1}_j(u_j)$

EDIT:

Note that this leads to the formula listed above, i.e.

$C_{\Sigma}( u_1, ..., u_d) = \Phi_{\Sigma}(\Phi^{-1}(u_1),..., \Phi^{-1}(u_d))$

because

$P( U_1 < u_1, ..., U_d < u_d | \Sigma) = $

$P(Z_1 < \Phi^{-1}(u_1), ... , Z_d < \Phi^{-1}(u_d) | \Sigma) $

because $u_j = \Phi(z_j)$, which is then equal to

$\Phi_{\Sigma}(\Phi^{-1}(u_1),..., \Phi^{-1}(u_d))$

by how $Z$ is defined.

Cliff AB
  • 20,980
  • When you say "the relation between the values of a random variable pushed through it's marginal CDF for each variable", does that describe the scenario of $C_P(u_1, \ldots, u_d) = \boldsymbol{\Phi}_P(\Phi^{-1}(u_1), \ldots, \Phi^{-1}(u_d)),$ or the latter 5 steps? Thanks! – user321627 Feb 24 '18 at 02:57
  • @user321627: just updated the answer (although I use the notation $C_\Sigma$, not $C_P$). – Cliff AB Feb 24 '18 at 03:10
  • Thanks!! Just a quick question, is $P(Z_1 > \Phi^{-1}(u_1), ... , Z_d > \Phi^{-1}(u_d) | \Sigma)$ the CDF or do I need a one minus the cdf to get the $>$ signs into $<$? – user321627 Feb 24 '18 at 03:27
  • @user321627: oops, I'm so used to survival analysis. Fixed the mistake, thanks! – Cliff AB Feb 24 '18 at 03:29
  • Thanks! Just wondering, I am trying to match step 5 with $\Phi_{\Sigma}(\Phi^{-1}(u_1),..., \Phi^{-1}(u_d))$. It seems that using $F_j^{-1}$ allows us to create arbitrary multivariate distribution of a specified correlation structure. Where exactly does $F_j^{-1}$ appear in $\Phi_{\Sigma}(\Phi^{-1}(u_1),..., \Phi^{-1}(u_d))$? Do I need to reapply $F_j^{-1}$ at the end as in $\Phi_{\Sigma}(\Phi^{-1}(F_1^{-1}(u_1)),..., \Phi^{-1}(F_d^{-1}(u_d)))$? Thanks! – user321627 Feb 24 '18 at 04:11
  • $F_j^{-1}$ doesn't appear in the formula for the copula function $C$...but remember that the copula function connects uniform(0,1) random variables. So you if you want to build a dependency structure between random variables with marginal CDF's $F_j$, then $F^{-1}_j$ connects $u_j$ with the observed value $x_j$, i.e. $x_j = F^{-1}_j(u_j)$. – Cliff AB Feb 24 '18 at 05:42
  • For the dependency structure, if we used $F_j^{-1} = \Phi^{-1}$, it seems that it would then give us correlated values. If I then set a threshold for these uniform values so that any value above 0 is 1, and any value below 0 is 0, then it appears that we have created correlated bernoulli random variables. This doesn't make any sense to me because using $F_j^{-1} = \Phi^{-1}$ cancels the original $\Phi$ transformation, and we are left with just thresholding values from $Z \sim MVN(0, \Sigma)$. Can you tell me what is going on here? – user321627 Mar 14 '18 at 09:00
1

I'm not sure what exactly is the issue here, but your formula doesn't even make a sense. Consider the term you suggest: $\Phi(u_1)$. Here $\Phi(u)$ is the cumulative distribution function for Gaussian variable. Its domain is from $-\infty$ to $\infty$, while $u_1$'s domain is from 0 to 1. The probabilities $u_i$ should be plugged into the inverse CDFs, not the CDFs themselves indeed.

Then you suggest to plug the marginal CDFs $\Phi$ into joint CDF $\boldsymbol{\Phi}_P$, but the (joint) CDF accepts inputs from $-\infty$ to $\infty$, while marginal CDFs $\Phi(u_i)$ output the probability from 0 to 1. Again, you're plugging the wrong thing: the probabilities instead of the variables themselves.

I'm not sure where you got the formula, but there's nothing to discuss here. Your formula doesn't work to start with.

Aksakal
  • 61,310