Consider a data frame with three variables: $x_1$, $x_2$, and $z_1$. I want to know if the correlation between the $X$ variables depends on $z_1$. Now, this could easily be done with an interaction term in ordinary least squares regression. However, that requires me choosing one of the $X$ variables to be the independent—and the other the dependent—variable, such that there are two possible models:
$x_1 = \beta_0 + \beta_1x_2 + \beta_2z_1 + \beta_3x_2z_1 + \epsilon$
or
$x_2 = \beta_0 + \beta_1x_1 + \beta_2z_1 + \beta_3x_1z_1 + \epsilon$
And the two $\beta_3$ coefficients aren't equivalent.
I don't have a good reason to choose one model or the other; instead, I want to model the correlation between the two and predict this from $z$. How can I do this?
I have considered making the correlation a latent variable, such that this latent variable is loaded by $x_1$ and $x_2$, with the loadings fixed at one. This models the covariance between the two as the variance of a latent factor. However, the correlation between the two variables is negative, and variances cannot be negative.
I am using the lavaan R package for this (I have dput the data at the end of this post).
This models the covariance between the two:
> model1 <- "x1 ~~ x2"
> parameterestimates(sem(model1, dat))
lhs op rhs est se z pvalue ci.lower ci.upper
1 x1 ~~ x2 -0.025 0.107 -0.232 0.816 -0.234 0.185
2 x1 ~~ x1 1.353 0.135 10.025 0.000 1.088 1.617
3 x2 ~~ x2 1.697 0.169 10.025 0.000 1.365 2.028
Note that the covariance is -0.025.
I can model this as a latent variable:
> model2 <- "cov =~ 1*x1 + 1*x2"
> parameterestimates(sem(model2, dat))
lhs op rhs est se z pvalue ci.lower ci.upper
1 cov =~ x1 1.000 0.000 NA NA 1.000 1.000
2 cov =~ x2 1.000 0.000 NA NA 1.000 1.000
3 x1 ~~ x1 1.378 0.174 7.914 0.000 1.036 1.719
4 x2 ~~ x2 1.721 0.202 8.512 0.000 1.325 2.118
5 cov ~~ cov -0.025 0.107 -0.232 0.816 -0.234 0.185
Warning message:
In lav_object_post_check(object) :
lavaan WARNING: some estimated lv variances are negative
Note that the variance of cov is equal to the covariance from model1. However, this is negative, which gives me the warning, as obviously variances cannot be negative.
Lastly, I can predict this from z1:
> model3 <- "cov =~ 1*x1 + 1*x2
+ cov ~ z1"
> parameterestimates(sem(model3, dat))
lhs op rhs est se z pvalue ci.lower ci.upper
1 cov =~ x1 1.000 0.000 NA NA 1.000 1.000
2 cov =~ x2 1.000 0.000 NA NA 1.000 1.000
3 cov ~ z1 0.247 0.041 5.995 0.000 0.166 0.328
4 x1 ~~ x1 1.412 0.172 8.218 0.000 1.075 1.749
5 x2 ~~ x2 1.687 0.195 8.653 0.000 1.305 2.069
6 cov ~~ cov -0.140 0.099 -1.411 0.158 -0.335 0.055
7 z1 ~~ z1 1.843 0.000 NA NA 1.843 1.843
Warning message:
In lav_object_post_check(object) :
lavaan WARNING: some estimated lv variances are negative
So now I can see that the covariance is predicted by z1. Again, I have negative variances.
This feels close to being valid, but still feels like I'm missing something and doing it incorrectly. Any ideas on how I can predict the correlation between two variables from a third?
dat <- structure(list(x1 = c(6.5, 6, 6.75, 2.5, 6, 7, 5.5, 6, 6, 5.5,
6, 5.5, 6.25, 5.5, 7, 6, 5.75, 6, 6, 4.25, 4, 6, 7, 7, 6, 6,
6.5, 7, 6, 5, 1, 5, 6, 6, 7, 7, 6, 6, 6.75, 7, 6.5, 4.5, 5, 3,
5.5, 3.5, 4, 6, 6.5, 6, 6, 6.5, 6, 5.25, 7, 6, 4, 5.25, 6.5,
5.5, 6.5, 5, 3.75, 4.75, 5, 4.75, 5, 4.75, 6.25, 6, 6, 6, 7,
6, 4.5, 5, 5.5, 4.5, 6, 7, 7, 6.5, 6.5, 6.25, 4, 5.25, 6, 4,
6, 6, 5, 5.5, 5.5, 7, 4.5, 5.5, 5.25, 4.75, 5, 5.5, 5.75, 6.5,
6, 6, 3.5, 6, 5, 5.5, 6, 6, 5, 6, 7, 6, 6, 6.75, 6, 6, 6, 6.25,
7, 6.75, 6, 6, 6, 6, 5.75, 3.5, 5, 4.5, 4.75, 6, 7, 2.5, 6.5,
5.5, 5.5, 5, 5, 7, 5.5, 6, 6, 6.25, 4.25, 7, 5, 4.25, 5.5, 4.75,
5, 7, 6, 6, 5, 2, 4, 6, 5.5, 4.75, 2, 4.5, 6, 6.75, 2.5, 3.5,
6.5, 6.25, 6, 5.5, 5.5, 5, 6, 4.5, 5.5, 5.5, 5, 3, 3, 6.5, 4.75,
5, 6, 4.5, 6, 5.75, 6, 5.5, 4, 4, 6, 1.75, 6.25, 6, 4, 5, 6,
6, 4, 1, 6), x2 = c(3, 2, 2.25, 2.5, 6, 3.75, 1.75, 2.75, 4.5,
3, 4, 2.5, 3.75, 4.5, 1.5, 2, 2.75, 2.5, 2, 3.5, 4, 3.25, 1,
1, 3.75, 5.25, 2, 1.5, 6, 2.5, 1, 1.5, 2, 3.25, 4, 2, 1.25, 1.75,
3.25, 5.5, 1.5, 3.5, 3.25, 1.5, 5, 3.75, 1.5, 1.75, 1.75, 1.5,
1.25, 1, 2, 5.5, 1.5, 1, 3.5, 1.5, 3.25, 1, 3.25, 2, 3.5, 3.25,
4, 1.5, 2.25, 3, 1, 2, 3.75, 4.25, 4.75, 2, 4, 4, 2.5, 2, 2,
2.5, 1, 3, 3.75, 2, 3.25, 3, 2.75, 4, 2, 2, 3.25, 3, 3.5, 2.5,
5.25, 2, 5.25, 3.5, 1, 1.5, 2.75, 2.75, 2.75, 2, 3, 5.5, 3.75,
3, 1, 2, 2, 1, 1, 6, 2, 1.25, 1.5, 1.75, 1, 1.25, 3, 2, 2.25,
2, 1, 1, 2, 2.5, 1.5, 4.75, 4, 3.25, 1, 2.25, 5.25, 4.75, 1,
2.5, 2, 1, 1, 1.5, 2.75, 5.5, 4.75, 1, 3.25, 3.25, 2, 2.75, 5,
1.25, 1.25, 2.5, 4, 2, 1, 1, 2.25, 2.5, 2.5, 4, 4.25, 1, 1, 1,
3, 2.25, 2, 2, 1, 2.5, 2, 6, 4.5, 1, 1, 1, 1.75, 2, 2.5, 1.25,
4.75, 3.75, 1.5, 2.25, 2, 3, 1.25, 3.5, 1, 1, 1, 4, 2.5, 3.5,
1.5, 3.75, 3, 1, 2.25), z1 = c(1, 1.28571428571429, 4.28571428571429,
1, 5.71428571428571, 5.14285714285714, 3.28571428571429, 4.28571428571429,
5.28571428571429, 1.85714285714286, 2.85714285714286, 3, 1.28571428571429,
4.42857142857143, 3.14285714285714, 2.57142857142857, 2, 2, 2.42857142857143,
4.28571428571429, 2.14285714285714, 1.85714285714286, 1.57142857142857,
2.28571428571429, 4.57142857142857, 3, 2.85714285714286, 5, 2,
3.85714285714286, 2, 2.42857142857143, 4, 3.85714285714286, 1.85714285714286,
3.28571428571429, 1, 1.71428571428571, 2.57142857142857, 3.85714285714286,
1.14285714285714, 2.14285714285714, 2.14285714285714, 1.71428571428571,
1.14285714285714, 3.57142857142857, 1.28571428571429, 1, 1.14285714285714,
1.42857142857143, 1.14285714285714, 1, 2.71428571428571, 5.14285714285714,
6.14285714285714, 1, 4.28571428571429, 1, 3.85714285714286, 1.85714285714286,
3.14285714285714, 3, 3.14285714285714, 3.14285714285714, 2.14285714285714,
3.28571428571429, 2.57142857142857, 4.85714285714286, 1.42857142857143,
4.57142857142857, 2.42857142857143, 1.14285714285714, 5.14285714285714,
3.42857142857143, 3.85714285714286, 1.28571428571429, 2.85714285714286,
2.42857142857143, 1.28571428571429, 7, 1.28571428571429, 5.57142857142857,
4.14285714285714, 1.71428571428571, 1.71428571428571, 1.42857142857143,
3.14285714285714, 1, 2.14285714285714, 3.28571428571429, 1.28571428571429,
1.85714285714286, 1.14285714285714, 4.71428571428571, 3.71428571428571,
2.85714285714286, 4, 3.14285714285714, 1, 1.14285714285714, 2.28571428571429,
2.14285714285714, 2.42857142857143, 3.28571428571429, 3.28571428571429,
3, 2.85714285714286, 4.14285714285714, 2.14285714285714, 2.28571428571429,
4.57142857142857, 1.71428571428571, 5, 2.57142857142857, 3, 1.57142857142857,
6.42857142857143, 1, 1, 1.71428571428571, 2.28571428571429, 1.85714285714286,
3.28571428571429, 4.28571428571429, 1, 3.28571428571429, 3.42857142857143,
2, 2, 2.57142857142857, 1.28571428571429, 3.85714285714286, 1.85714285714286,
1, 4.71428571428571, 1.85714285714286, 1.28571428571429, 1.42857142857143,
2.14285714285714, 1, 1, 2, 1.42857142857143, 1, 4.57142857142857,
6, 2.71428571428571, 2.57142857142857, 1.14285714285714, 3.14285714285714,
5, 5.71428571428571, 1.85714285714286, 3.71428571428571, 1.85714285714286,
1.57142857142857, 1.57142857142857, 1.14285714285714, 2.57142857142857,
1.57142857142857, 1.14285714285714, 2.71428571428571, 1.28571428571429,
1.28571428571429, 1.57142857142857, 1, 1, 3.71428571428571, 1.14285714285714,
4.28571428571429, 1.57142857142857, 2.14285714285714, 2.14285714285714,
4.42857142857143, 4, 1, 1.14285714285714, 1, 1, 2, 2.85714285714286,
3.57142857142857, 3.71428571428571, 4, 1.28571428571429, 2.57142857142857,
1.42857142857143, 1.57142857142857, 1.14285714285714, 1.14285714285714,
2, 1, 4, 2.14285714285714, 1.57142857142857, 1.14285714285714,
3.85714285714286, 2.85714285714286, 2, 1, 2.14285714285714)), class = "data.frame", .Names = c("x1",
"x2", "z1"), row.names = c(NA, -201L))