According to this post, the expected correlation between the sampling distributions for the slope and intercept in OLS regression is given by E(Corr) = -E(X) / sqrt(E(X^2)).
Now, let's consider an experiment with N trials where I measure Y at relatively the same X values across all trials, ensuring that E(Corr) is essentially constant in this experiment. Subsequently, I fit a line to the results of each trial, resulting in a population distribution of slopes and intercepts with a sample size of N.
My initial assumption is that if X is constant, then the population distribution of slopes and intercepts should be correlated to the same degree as E(Corr); however, my results suggest otherwise.
My question is: Does the correlation between the slopes and intercepts of the population distribution have a statistical explanation similar to E(Corr), or could this result be a unique property of the dataset I am working with? In other words, are the slopes and intercepts of the population distribution naturally correlated or is this only true if certain conditions are met?
edit: here is some R code to demonstrate what I mean
set.seed(89112)
#Number of trials
n=10000
#x is constant across trials
x=c(1,10,30,60,100,200)
#pre-allocate matrix to save results
res=matrix(NA,nrow=n,ncol=2)
#run loop
for (i in 1:n) {
#generate sample
y=1+rnorm(1,0,0.01)*x+rnorm(length(x))
#fit linear model
mod=lm(y~x)
#save intercept
res[i,1]=summary(mod)$coefficients[1,1]
#save slope
res[i,2]=summary(mod)$coefficients[2,1]
}
#Correlation between populations of slopes and intercepts
cor.test(res[,1],res[,2])
> -0.3677728
#Correlation between sampling distributions of slopes and intercepts
-mean(x)/sqrt(mean(x^2))
> -0.7005973
rnorm(1,0,0.01)*and try it again. – whuber Dec 13 '23 at 19:01