I'm reading Counterfactuals and Causal Inference by Morgan and Winship. In chapter 6, they discuss OLS as a means of estimating the average treatment effect for a binary exposure $D$ (assuming all assumptions to do so are met). They describe a scenario in which the population can be perfectly stratified by a variable $S = \left\{ 1, 2, 3 \right\} $, and that $S$ is sufficient to block any backdoor paths. Were one to perform a regression in which $S$ is dummy coded, using $S=1$ as the reference group, the estimate from OLS would be equal to
$$ \delta_{OLS} = \dfrac{1}{c}\sum_s \operatorname{Var}[d_i\mid s_i=s] \operatorname{Pr}[s_i=s] \left\{ E[y_i \mid d_i=1, s_i=s] - E[y_i \mid d_i=0, s_i=s] \right\} $$
where $c$ is a scaling constant equal to the sum of conditional variances.
This is fairly surprising to me. I was under the impression that the coefficient from OLS would be an estimate of the ATE averaging over the distribution of $S$ (i.e. the weights would simply be $\operatorname{Pr}[s_i=s]$) , but it seems that OLS is giving more weight to those strata in which the propensity score is closer to 0.5. In the words of the authors "[OLS] can yield estimates that are far from the true ATE even in an infinite sample".
Why does OLS perform conditional variance weighting like this? Can someone demonstrate to me why this is a consequence from the typical setup of OLS?