2

Closest question I could find to mine was this one, which doesn't cover it.

Is it possible to specify a correlation between two parameters in a Stan model? Consider a linear regression specified by:

$$ \begin{align} y &\sim Normal(\mu, \sigma) \\ \mu &= \alpha + \beta \times x \\ \alpha &\sim Normal(0, 1) \\ \beta &\sim Normal(0, 1) \\ \sigma &\sim Exponential(1) \end{align} $$

Can I add a parameter for the correlation between $\alpha$ and $\beta$? I know that in random effect models, for example, you can have a variance-covariance matrix for intercepts and slopes. Is there something analogous for fixed effects?

  • 1
    Why not put a prior on the joint distribution of $\alpha$ and $\beta$? That way, you can specify the correlation be whatever you want – Demetri Pananos Mar 22 '24 at 05:28
  • That sounds exactly like what I want. Do you have to specify any new parameters for that? Or can you specify a joint prior? I tried looking at the documentation for "Multivariate Changes of Variables" and didn't understand what it was talking about. Am I looking in the wrong place? Thanks for your help, I'm pretty new to Bayesian stats in case that's not obvious haha. – Corned Beef Hash Map Mar 22 '24 at 17:29

1 Answers1

2

The trick here is to use the multi_norm family of log probability density functions in the model block of your stan program.

Here is an example of putting a joint prior on the coefficients of the linear model. As per your request, the two parameters will be correlated with a correlation of $\rho$.

library(cmdstanr)
#> This is cmdstanr version 0.6.1.9000
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: /Users/demetripananos/.cmdstan/cmdstan-2.33.1
#> - CmdStan version: 2.33.1
#> 
#> A newer version of CmdStan is available. See ?install_cmdstan() to install it.
#> To disable this check set option or environment variable CMDSTANR_NO_VER_CHECK=TRUE.
library(tidyverse)

stan_code <- ' data{ int n; int p; vector[n] x; vector[n] y; matrix[p, p] A; } transformed data{ matrix[n, 2] X; X[, 1] = rep_vector(1.0, n); X[, 2] = x; } parameters{ vector[p] beta; real<lower=0> sigma; } model{ sigma ~ exponential(1.0); beta ~ multi_normal_cholesky(rep_vector(0.0, p), A); y ~ normal(X * beta, sigma); } '

n <- 10 x <- rnorm(n) y <- 2*x + 1 + rnorm(n, 0, 0.3)

Construct the desired covariance matrix

Here, prior variances are 1, and correlation between

parameters is 0.8

variances <- diag(c(1, 1)) correlation_matrix <- matrix(c(1, 0.8, 0.8, 1), nrow = 2) covariance_matrix <- variances %% correlation_matrix %% variances A <- chol(covariance_matrix)

stan_data = list(n=n, x=x, y=y, A=A, p=nrow(A))

model <- write_stan_file(stan_code) %>% cmdstan_model() #> /Users/demetripananos/.cmdstan/cmdstan-2.33.1/stan/lib/stan_math/lib/tbb_2020.3/build/Makefile.tbb:28: CONFIG: cfg=release arch=arm64 compiler=clang target=macos runtime=cc15.0.0_os14.4 #> ld: warning: duplicate -rpath '/Users/demetripananos/.cmdstan/cmdstan-2.33.1/stan/lib/stan_math/lib/tbb' ignored

model$sample(stan_data) #> Running MCMC with 4 sequential chains... #> #> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup) #> Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup) #> Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup) #> Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup) #> Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup) #> Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup) #> Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup) #> Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup) #> Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup) #> Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup) #> Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup) #> Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling) #> Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling) #> Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling) #> Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling) #> Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling) #> Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling) #> Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling) #> Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling) #> Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling) #> Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling) #> Chain 1 Iteration: 2000 / 2000 [100%] (Sampling) #> Chain 1 finished in 0.0 seconds. #> Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup) #> Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup) #> Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup) #> Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup) #> Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup) #> Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup) #> Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup) #> Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup) #> Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup) #> Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup) #> Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup) #> Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling) #> Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling) #> Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling) #> Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling) #> Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling) #> Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling) #> Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling) #> Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling) #> Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling) #> Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling) #> Chain 2 Iteration: 2000 / 2000 [100%] (Sampling) #> Chain 2 finished in 0.0 seconds. #> Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup) #> Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup) #> Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup) #> Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup) #> Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup) #> Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup) #> Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup) #> Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup) #> Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup) #> Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup) #> Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup) #> Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling) #> Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling) #> Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling) #> Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling) #> Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling) #> Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling) #> Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling) #> Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling) #> Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling) #> Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling) #> Chain 3 Iteration: 2000 / 2000 [100%] (Sampling) #> Chain 3 finished in 0.0 seconds. #> Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup) #> Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup) #> Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup) #> Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup) #> Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup) #> Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup) #> Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup) #> Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup) #> Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup) #> Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup) #> Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup) #> Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling) #> Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling) #> Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling) #> Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling) #> Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling) #> Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling) #> Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling) #> Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling) #> Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling) #> Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling) #> Chain 4 Iteration: 2000 / 2000 [100%] (Sampling) #> Chain 4 finished in 0.0 seconds. #> #> All 4 chains finished successfully. #> Mean chain execution time: 0.0 seconds. #> Total execution time: 0.6 seconds. #> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail #> lp__ -1.41 -1.01 1.50 1.24 -4.37 0.24 1.00 1503 1425 #> beta[1] 1.01 1.00 0.12 0.11 0.83 1.20 1.00 2141 1595 #> beta[2] 2.03 2.06 0.19 0.16 1.69 2.29 1.00 1789 1310 #> sigma 0.35 0.33 0.13 0.10 0.21 0.58 1.00 1502 1477

Created on 2024-03-23 with reprex v2.0.2