I will look at something a bit more general: $Y$ is continuous, $X$ is binary, and $Z$ has $K$ levels. (update: I don't actually use the assumption that $X$ is binary, but (a) I didn't know I wouldn't and (b) it avoids questions about correct model specification)
Start with what I will call the adjusted model
$$E[Y]=\beta X+\sum_{k=1}^K \gamma_k I(Z=k)$$
By a standard result for linear models, the Frisch-Waugh-Lovell theorem, you can fit this model in two steps. First, regress $Y$ on $Z$ and $X$ on $Z$, then regress the residuals from the $Y$ model on the residuals from the $X$ model.
The residuals $r_Y$ from the $Y$ model are $Y$ centred at zero for each level of $Z$; ie, $Y$ minus the mean of $Y$ for the corresponding level of $Z$. The residuals $r_X$ from the $X$ model are $X$ centered at each level of $Z$. Econometricians would call the regression of $r_Y$ on $r_X$ the within estimator since it only looks at the relationship of $X$ and $Y$ within levels of $Z$.
We have an explicit form for the within estimator of $\hat\beta$:
$$\hat\beta = \frac{\mathrm{cov}(r_Y,r_X) }{\mathrm{var}(r_X)}$$
Next, consider the stratified models $E[Y]=\alpha_k+\beta_k X$ fitted. If we center $Y$ and $X$ in the stratified models we will change $\alpha_k$ (to zero) but will not change $\beta_k$. We have, within stratum $k$
$$\hat\beta_k=\frac{\mathrm{cov}(r_Y,r_X) }{\mathrm{var}(r_X)}$$
Finally, combining $\hat\beta_k$. We want a precision-weighted average. That is, we want to weight the stratum-specific $\beta$ by the reciprocal of its variance. Here is the only place where we cheat: the variance of $\hat\beta_k$ is actually $\sigma^2_k/\mathrm{var}(r_X)$, where $\sigma^2_k$ is the residual variance in stratum $k$. We will actually use variances proportional to $1/((n_k-1)\mathrm{var}(r_X))$, where $n_k$ is the stratum size, equivalent to assuming that the residual variance is the same in each stratum.
Write $w_k=(n_k-1)\mathrm{var}(r_X)$ as the precision weights. We have
$$\sum_k w_k\hat\beta_k = \sum_k (n_k-1)\mathrm{cov}(r_X,r_Y)$$
and in the denominator
$$\sum_k w_k = \sum_k (n_k-1)\mathrm{var}(r_X)$$
and, lo and behold,
$$\frac{\sum_k w_k\hat\beta_k}{\sum_k w_k}=\frac{\sum_k \mathrm{cov}(r_X,r_Y}{\sum_k \mathrm{var}(r_X)}$$
the adjusted $\hat\beta$ is the same as the within-stratum $\hat\beta$ is the same as the precision-weighted stratified $\hat\beta$.
Now, an example
This is an old biostatistics dataset, looking at lung function ($FEV_1$) and smoking in some children in Boston in the mid-1970s. We're interested in the relationship with smoking
> fev<-read.table("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/fev.txt", header=TRUE)
> lm(fev~smoking,data=fev)
Call:
lm(formula = fev ~ smoking, data = fev)
Coefficients:
(Intercept) smoking
2.6346 0.6054
That's higher FEV1 in kids who smoke, which is ... surprising?
The answer is confounding by age
plot(fev~jitter(age), data=fev,pch=19,
col= ifelse(smoking==1,"purple","orange"))

We might do an adjusted analysis
> lm(fev~smoking+factor(age),data=fev)
Call:
lm(formula = fev ~ smoking + factor(age), data = fev)
Coefficients:
(Intercept) smoking factor(age)7 factor(age)8 factor(age)9
1.6580 -0.1757 0.2115 0.4573 0.7761
factor(age)10 factor(age)11 factor(age)12 factor(age)13 factor(age)14
1.0430 1.3989 1.5870 1.8784 1.9735
factor(age)15 factor(age)16 factor(age)17
1.9193 2.1055 2.6824
or a stratified analysis
> betas<-sapply(unique(fev$age), function(age_i) coef(lm(fev~smoking, subset=age==age_i,data=fev))["smoking"])
>
> weights<-sapply(unique(fev$age), function(age_i) var(fev$smoking[fev$age==age_i])*(sum(fev$age==age_i)-1))
>
> weighted.mean(betas,weights)
[1] -0.1756593
Note that this works even though some of the strata don't have enough smokers to estimate $\beta_k$. They don't in the adjusted model, either, and they get a weight of zero.
In this example, the confounding relationship is quite close to being linear and additive, so you a point estimate that's quite close by adjusting for a linear term in age:
> lm(fev~smoking+age,data=fev)
Call:
lm(formula = fev ~ smoking + age, data = fev)
Coefficients:
(Intercept) smoking age
0.2040 -0.2358 0.2479
This analysis is importantly different -- it uses all the data, including from ages where $\beta_k$ can't be estimated directly. This estimator relies on extrapolation of the linear FEV:age relationship to hypothetical younger smokers. Realistically, if you want a linear term in age, you should restrict to age>=10 or even further since that's where you have data. One of the advantages of thinking in terms of stratification is to motivate this sort of restriction.
Finally, generalisations.
Lin & Zeng have a couple of papers showing this is (approximately) true very generally, with the goal of showing that meta-analysis of study-specific estimates has (nearly) full efficiency compared to pooled analysis of individual data.