Stephan Kolassa has an important point and provided a good example for this being a common phenomenon but i think, that i can actually explain it. Namely this is a kind of weak Simpsons Paradox(https://en.wikipedia.org/wiki/Simpson's_paradox), which is a) not limited to subgroups and b) only suppresses the "effect" not wholly inverts it.
Consider the classic example of grad school admission(DV) at Berkeley in the 70s, where being a women(IV) was correlated with being in a more competitive Department(CV). Now the correlation between DV ad IV was negative , but once adjusted for the departments it actually changes into being positive. However if the correlation between IV and CV or the adjusted Correlation CV and DV had been smaller we might have seen no unadjusted(marginal) Correlation between IV and DV. This would translate to $R^2 = 0$ and then obviously $\Delta R^2 > R^2$.
In the example from Stephan Kolassa the adjusted effects are both positive and the correlation between IV, CV is (purely by chance) negative, which means that the marginal effect of IV on DV is suppressed by there being fewer CV = B for high IV values. If we simulate the setup 1000 times this becomes apparent:
nn <- 20
res <- replicate(1000, {
IV <- runif(nn)
CV <- as.factor(rep(c("A","B"),each=nn/2))
DV <- 1*IV+2*as.numeric(CV)-0.0*IV*as.numeric(CV)+rnorm(nn,0,1)
c(cor(as.numeric(CV), IV), (summary(lm(DV~IV+CV))$r.squared-summary(lm(DV~CV))$r.squared) - summary(lm(DV~IV))$r.squared)
})
plot(res[1, ], res[2, ], xlab = "Cor(IV, CV)", ylab = "deltaR2 - R2")
abline(h = 0, v = 0)
$\Delta R^2 - R^2$ is negatively correlated with Cor(IV, CV)" />
Now this is messy, because the $\hat{\beta}$s have a high variance and if the sign of one of the betas changes then so does the effect. We can however see, that if IV and CV are uncorrelated, the $\Delta R^2 - R^2 = 0$ which, if you have a strong intuition about the geometry of a linear model, follows from Pythagorean Theorem.
If we increase the signal to noise ratio by using a smaller $\sigma$ the picture is much clearer
nn <- 20
res <- replicate(1000, {
IV <- runif(nn)
CV <- as.factor(rep(c("A","B"),each=nn/2))
DV <- 4*IV+2*as.numeric(CV)-0.0*IV*as.numeric(CV)+rnorm(nn,0, 0.2)
c(cor(as.numeric(CV), IV), (summary(lm(DV~IV+CV))$r.squared-summary(lm(DV~CV))$r.squared) - summary(lm(DV~IV))$r.squared)
})
plot(res[1, ], res[2, ], xlab = "Cor(IV, CV)", ylab = "deltaR2 - R2")
abline(h = 0, v = 0)
$\Delta R^2 - R^2$ is strongly negatively correlated with Cor(IV, CV). Its almost a straight line." />
In conclusion: Looking at DV ~ IV only tells you about the sum of the direct effect from IV to DV and the indirect effect from IV to CV and then to DV. This indirect effect can be in the same direction ($\Delta R^2 < R^2$), opposite direction but of moderate size ($\Delta R^2 > R^2$) or outright overpower the direct effect like with Berkley in which case $R^2$ is not a very sensible metric. The indirect effect is basically the product of IV -> CV and the direct effect from CV to DV, so it's positive if both of those have the same sign and negative if they don't. Please consult causal theory to figure out, if marginal effects or adjusted effects are right for you. After all cigarettes are perfectly healthy if you adjust on the state of peoples lunges.
P.S. I've used the term direct effect which should be read as adjusted and effects are of course just correlations not causations.