I recently had to solve this problem, and hopefully my response can shed some light. (See also here.)
The thing that finally made it click for me was to split the problem into two parts: linear algebra vs. probability.
$
\newcommand{\mat}[1]{\boldsymbol{#1}}
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\rank}[1]{\operatorname{rank}\left[#1\right]}
$
Linear Algebra
To begin, consider the purely deterministic least squares problem for a given $\mat{y} \in \mathbb{R}^n$ and $\mat{X} \in \mathbb{R}^{n\times{p}}$, whose solution
$$
\norm{\mat{e}}^2=\min_{\mat{\beta}}\norm{\mat{y}-\mat{X}\mat{\beta}}^2
$$
corresponds to residual vector $\mat{e}$.
Partitioning the inputs as
$$
\mat{y} = \begin{bmatrix}\mat{y}_{(i)}\\y_i\end{bmatrix} ,
\mat{X} = \begin{bmatrix}\mat{X}_{(i)}\\\mat{x}_i^T\end{bmatrix}
$$
and solving the related leave one out problem
$$
\norm{\mat{e}_{(i)}}^2=\min_{\mat{\beta}}\norm{\mat{y}_{(i)}-\mat{X}_{(i)}\mat{\beta}}^2
$$
gives a residual vector $\mat{e}_{(i)}$, corresponding to the original problem with point $i$ held out.
The key linear algebraic result is that the two answers are related by
$$\boxed{
\norm{\mat{e}}^2 = \norm{\mat{e}_{(i)}}^2 + \frac{e_i^2}{1-h_i}
}$$
where $h_i$ is the leverage of the held out point $i$.
(My derivation was not pretty, and is omitted here for the sake of brevity.)
Before turning to the stochastic component, a final linear algebra note is that the residual can be expressed as
$
\mat{e}=\left(\mat{I}-\mat{H}\right)\mat{y}=\mat{G}\mat{y}
$,
where the hat matrix $\mat{H}$ is purely a function of $\mat{X}$, and $$\rank{\mat{H}}+\rank{\mat{G}}=p+m=n$$
This means that for a given $\mat{X}$, the residual $\mat{e}$ has only $m=n-p$ independent components (residual degrees of freedom), as $\mat{y}$ varies over $\mathbb{R}^n$, and $\mat{e}$ varies over the null space of $\mat{X}$. Similarly, if we hold $y_i$ fixed, then $\mat{e}_{(i)}$ has $m-1$ independently varying components as $\mat{y}_{(i)}$ varies over $\mathbb{R}^{n-1}$. Alternatively, we can independently vary $y_i$ over $\mathbb{R}$ while leaving $\mat{e}_{(i)}$ unchanged.
(Formally, $\mat{H}$ and $\mat{G}$ are the components of the outer product of the $\mat{Q}$ factor in the QR decomposition of $\mat{X}$.)
Probability
Now if our right hand side vector is actually
$$
\mat{y} = \mat{X}\mat{\tilde{\beta}} + \mat{\epsilon} ,
\mat{\epsilon} \sim \mathcal{N}_{\mat{0}, \mat{I}\sigma^2}
$$
then as $\mat{\epsilon}$ varies randomly over $\mathbb{R}^n$, the particular data $\mat{y}$ and residuals $\mat{e}$, $\mat{e}_{(i)}$ will also vary, with corresponding Gaussian probability densities.
In this case we have
$$
\begin{aligned}
\hat{\sigma}^2 m &\equiv \norm{\mat{e}}^2 &&\sim \chi^2_{m} \\
\hat{\sigma}^2_{(i)}\left(m-1\right) &\equiv \norm{\mat{e}_{(i)}}^2 &&\sim \chi^2_{m-1} \\
\rho_i^2 &\equiv \frac{e_i^2}{1-h_i} &&\sim \chi^2_1
\end{aligned}
$$
where $\hat{\sigma}^2_{(i)}$ and $\rho_i^2$ are independent.
Then the squared internally studentized residual corresponding to $e_i$ is
$$
r_i^2 \equiv \frac{\rho_i^2}{\hat{\sigma}^2} =
\frac{\rho_i^2}{\norm{\mat{e}}^2/m} \implies
\boxed{
\frac{r_i^2}{m} = \frac{\rho_i^2}{\norm{\mat{e}_{(i)}}^2 + \rho_i^2}
} \sim \operatorname{Beta}_{\frac{1}{2}, \frac{m-1}{2}}
$$
where the last step follows from the properties of the Beta distribution.
Similarly, for completeness it is worth noting that the corresponding externally studentized residual is given by
$$
t_i^2 \equiv \frac{\rho_i^2}{\hat{\sigma}^2_{(i)}} =
\frac{\rho_i^2/\left(1\right)}{\norm{\mat{e}_{(i)}}^2/\left(m-1\right)}
\sim \operatorname{F}_{1, m-1}
$$
from the properties of the F distribution.