I am trying to go through a proof of the problem that as the number of parameters grows in a model with $n$, the MLE may not be consistent. An example is the model $y_{ij}=\mu_i+\epsilon_{ij}$, where $\mu_i$ is the mean for individual $i \in (1,...,n)$ and $j\in (1,2)$, so each individual has two measurements $y_{i1}$ and $y_{i2}$. When an individual is added, another parameter $\mu$ therefore has to be estimated. The variance component for $\epsilon$ is distributed as $N(0,\sigma^2)$.
The MLE for $\mu_i$ is $\hat{\mu}_i=\frac{y_{i1}+y_{i2}}{2}$.
For $\sigma^2$, the MLE is $\hat{\sigma}^2=\frac{1}{n} \sum_{i=1}^{n} s_{i}^{2}$, where $\hat{s}_{i}^{2}=\frac{[(y_{i1}-\hat{\mu_i})^2+(y_{i2}-\hat{\mu_i})^2]}{2}$.
Since $\hat{\mu_i}=\frac{y_{i1}+y_{i2}}{2}$, $y_{i1}-\hat{\mu_i}$ and $y_{i2}-\hat{\mu_i}$ are $y_{i1}-\frac{y_{i1}+y_{i2}}{2}=\frac{y_{i1}-y_{i2}}{2}$ and $y_{i2}-\frac{y_{i1}+y_{i2}}{2}=\frac{y_{i2}-y_{i1}}{2}$. Replacing those in the formula for $\hat{s}_{i}^{2}$ gives $s_{i}^{2}=\frac{\big[\big(\frac{y_{i1}-y_{i2}}{2} \big)^2 + \big( \frac{y_{i2}-y_{i1}}{2} \big)^2 \big]}{2}$.
At this point it is concluded that $E[s_{i}^{2}]=\frac{\sigma^2}{2}$, and the MLE for $\sigma^2$ will also be $\frac{\sigma^2}{2}$. What I'm not understanding is how $E[s_{i}^{2}]=\frac{\sigma^2}{2}$ is obtained from $s_{i}^{2}=\frac{\big[\big(\frac{y_{i1}-y_{i2}}{2} \big)^2 + \big( \frac{y_{i2}-y_{i1}}{2} \big)^2 \big]}{2}$? I see in this answer that $Y_{i1}-Y_{i2}\stackrel{d}{=}N(0,2\sigma^2)$ but why is this and why does it imply the result $\frac{\sigma^2}{2}$?
As an aside, is it the case that this bias in the estimate of $\sigma^2$ is the reason for the use of REML in mixed effect models?