Let me make some introduction first.
It's well known, that in randomized trials, in the pre-post trials, by the formal pharmaceutical regulatory guidelines, both raw post and change scores should be adjusted for baseline, so the ANCOVA of either post or change is the default analytical method. The adjustment for the baseline is unconditional, and addresses 1) the statistical artefacts of "fake difference at baseline" when proper randomization took place and also 2) reduces the effect of regression to the mean.
But, in non-randomized trials the changes likely come from separate populations and are totally expected. It's rather weird to not have them. Here the use of ANCOVA on post values is strongly discouraged because of the Lord's paradox and other biases. This happened in 95% projects I analysed in the last 25 years. In this case, the ANOVA of change remains the only valid method - and now start the problems.
Of course, in non-randomized trials, in both cases it's entirely nonsensical to adjust for the baseline, as the changes are real and this diminishes the possible treatment effect.
But, at the same time, it's proven, that the change scores are negatively correlated with the pre values. If the pre-values and post values are not correlated, the change vs. pre is about -0.71, which comes from simple vector algebra.
It's easy to show it empirically. Auxiliary lines are made at 0.0 vertically and -0.7 horizontally.
set.seed(100)
with(do.call(rbind, lapply(1:100, function(i) {
x1 <- rnorm(1000)
x2 <- rnorm(1000)
data.frame(prepost = cor.test(x1, x2)$estimate, prechange = cor.test(x1, x2-x1)$estimate)})),
plot(prepost, prechange, xlim = c(-.1, .1), ylim=c(-1, 0), main="Correlation pre vs. change"))
abline(h=-0.71)
abline(v=0)
Now, tell me please, how to address this strong correlation while analysis the change scores in non-randomized trials? This statistical artefact occurs always, only with different magnitude, depending on the initial correlation.
It happens practically always, except for the pre-post correlation ~ 1.
Does adjusting for the baseline (regardless of everything) a way to handle that?
Kindly please, I want to save our time and avoid any irrelevant ideological discussions in the spirit of "you shouldn't adjust the change for baseline, because .....". I know the discussions and arguments and it doesn't bring any value here. It's requested by the regulatory guidelines and is also well justified from statistical perspective, so let's focus ONLY on this kind of correlation.
Examples:
R = 0.9
set.seed(100)
with(do.call(rbind, lapply(1:100, function(i) {
x <- mvrnorm(n=1000, mu=rep(0,2), Sigma= matrix(.9, nrow=2, ncol=2) + diag(2)*.1)
x1 <- x[, 1]
x2 <- x[, 2]
data.frame(prepost = cor.test(x1, x2)$estimate, prechange = cor.test(x1, x2-x1)$estimate)})),
plot(prepost, prechange, xlim = c(-1, 1), ylim=c(-1, 1), main="Correlation pre vs. change.\nR(pre, post)=0.9"))
abline(h=-0.71)
abline(v=0)
R = 0.5
R = -0.5
and finally R = -0.9




