How would one prove that the expected value of the residuals from OLS
regression is zero?
In linear regression framework many problems can emerge from the so called error term. However you here speak unambiguously about residuals in OLS context.
Then, the expected value of residuals is zero by construction. Algebra demand it; the origin come from the first order conditions of optimization for OLS parameters. Usual assumptions about error term have no role.
Following Ben's notation we can write
$$\begin{equation} \begin{aligned}
\mathbf{1}' \mathbf{r} &= \mathbf{1}'(\hat{\mathbf{Y}} -\mathbf{Y})
= \mathbf{1}' \hat{\mathbf{Y}} - \mathbf{1}' \mathbf{Y} = 0
\end{aligned} \end{equation}$$
Therefore not only the expected value is zero but the sum of residuals is precisely zero too, always. ($\mathbf{1}$ is a vector of 1)
The problem of the Ben's explanation is in the second row
$$\begin{equation} \begin{aligned}
\mathbf{r} &= (\mathbf{I}-\mathbf{h}) \boldsymbol{Y} \\[6pt]
&= (\mathbf{I}-\mathbf{h}) (\boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\[6pt]
\end{aligned} \end{equation}$$
from the decomposition of $\boldsymbol{Y}$ the assumption about the error term $\mathbb{E}(\boldsymbol{\varepsilon}|\boldsymbol{X}) = \mathbf{0}$ seems needed, but this is not. Important to note that hat matrix ($\mathbf{h}$) should be used on OLS parameters: $\mathbf{h} \boldsymbol{Y}= \boldsymbol{X}'\boldsymbol{b} = \hat{\mathbf{Y}}$
Finally we can verify that by construction:
$\mathbb{E} [\boldsymbol{Y} | \boldsymbol{X} ]= \boldsymbol{X}'\boldsymbol{b} = \hat{\mathbf{Y}}$
and $\boldsymbol{Y} = \hat{\mathbf{Y}} + \mathbf{r}$
therefore $\mathbb{E} [\mathbf{r} | \boldsymbol{X} ]= \mathbb{E} [\boldsymbol{Y} | \boldsymbol{X}] - \mathbb{E} [\hat{\mathbf{Y}} | \boldsymbol{X}] = \boldsymbol{0}$
by construction too
As sidenote:
Expectation is a population concept. If we take the expectation of a
residual, what would this represent? The sample means of a residual
term in the long run or population?
Expectation is a general concept, you can refer it even at one observation only. For example even at just one coin tossing. The proper application depend on the context and the question.
Your context is linear regression estimated with OLS, therein the residuals are a well defined object. Important to note that residuals are an estimation quantity, you compute them. Different thing are errors, them are outside the control of researcher, them are unobservables, for this reason you have to make assumption about.
Something like "population residuals" is an ambiguous object. Residuals are all in your hands, always. You can think about these in a scheme where the amount of observations go to infinity or cover all the population. But nothing change in the above algebra and implications; them depend from the so called Geometry of OLS.
Said that, residuals remain interpretable as random variables and you can compute their expectation. Without loss of generality you can think about expectation of residuals (or estimators, ecc) as conditional of regressors ($\mathbb{E}(\mathbf{r}|\boldsymbol{X})$), so no matters if them are stocastic or not.