I'm confused by something I've found by adding a variable with no relationship with a DV, using multiple regression with four predictors and one DV (Y). If I regress $Y$ onto $X_1,~ X_2,$ and $X_3$, the multiple $R$ is less than if I add a 4th predictor with no relationship with $Y$. I didn't think this was possible. I've done this via a simulation and also more manually, with each shown below. What's even more confusing is that if I specify the 4th variable to have a correlation of $.2$ with the DV, the $R^2$ is less than if the 4th variable has a correlation of $0$ with the DV. How is this possible?
### via simulation ###
library(MASS)
library(psych)
rx12 = .2
rx13 = .25
rx14 = .3
rx23 = .35
rx24 = .3
rx34 = .4
rx1y = .15
rx2y = .25
rx3y = .2
rx4y = 0
corr_matrix <- matrix(c(1, rx12, rx13, rx14, rx1y,
rx12, 1, rx23, rx24, rx2y,
rx13, rx23, 1, rx34, rx3y,
rx14, rx24, rx34, 1, rx4y,
rx1y, rx2y, rx3y, rx4y, 1), nrow=5)
corr_matrix #this shows the correlation is zero#
set.seed(33)
data = as.data.frame (mvrnorm(n=1000, mu=c(.0, .0, .0, .0, 0), Sigma=corr_matrix, empirical=TRUE))
psych::corr.test(data)$r #this shows the correlation is zero#
summary(lm(V5 ~ V1 + V2 + V3, data=data)) #R^2 = .0833
summary(lm(V5 ~ V1 + V2 + V3 + V4, data=data)) #R^2 = .1044
matrix multiplication with all 4 variables
corr_matrix_x <- matrix(c(1, rx12, rx13, rx14,
rx12, 1, rx23, rx24,
rx13, rx23, 1, rx34,
rx14, rx24, rx34, 1), nrow=4)
corr_matrix_y <- matrix(c(rx1y, rx2y, rx3y, rx4y), nrow=4)
corr_matrix_y #this shows the correlation is zero#
x_inverse <- solve(corr_matrix_x)
betas <- as.matrix(x_inverse %% corr_matrix_y)
t(betas) %% corr_matrix_y #R^2 = .1044
3 variables
corr_matrix_x <- matrix(c(1, rx12, rx13,
rx12, 1, rx23,
rx13, rx23, 1), nrow=3)
corr_matrix_y <- matrix(c(rx1y, rx2y, rx3y), nrow=3)
x_inverse <- solve(corr_matrix_x)
betas <- as.matrix(x_inverse %% corr_matrix_y)
t(betas) %% corr_matrix_y #R^2 = .0833
Follow-up: My friend visually examined the issue of R^2 being larger when the X4 correlation with Y is zero or small (~ <.2), and then only increasing at correlations ~>.2. Image below:


with(as.data.frame(residuals(lm(cbind(V4, V5) ~ ., data))), {plot(V4, V5); abline(lm(V5 ~ V4))}). This exhibits a clear negative linear relationship betweenV4andV5after controlling for the effects of the other three variables. – whuber Dec 28 '22 at 16:38