If we take the R-squared to be the comparison of Deviances between models (the model of interest, the saturated model, and the constant model), we can write it as (see this answer CC BY-SA 4.0):
$$R_{GLM}^2 = 1-\frac{D_{RES}}{D_{TOT}} = \frac{D_{REG}}{D_{TOT}}.$$
Deviance is expressed as comparison of log-likelihoods like:
$$\begin{matrix} \text{Null Deviance} \quad \quad \text{ } \text{ } & & \text{ } D_{TOT} = 2(\hat{\ell}_{S} - \hat{\ell}_0), \\[6pt] \text{Explained Deviance} & & D_{REG} = 2(\hat{\ell}_{p} - \hat{\ell}_0), \\[6pt] \text{Residual Deviance}^\dagger \text{ } & & \text{ } D_{RES} = 2(\hat{\ell}_{S} - \hat{\ell}_{p}). \\[6pt] \end{matrix}$$ In these expressions the value $\hat{\ell}_S$ is the maximised log-likelihood under a saturated model (one parameter per data point), $\hat{\ell}_0$ is the maximised log-likelihood under a null model (intercept only), and $\hat{\ell}_{p}$ is the maximised log-likelihood under the model (intercept term and $p$ coefficients).
The Gaussian log-likelihoods, assuming the variance of the response is constant among observations, but keeping it explicitly different among models, are as below.
\begin{cases} \hat{\ell}_{p}= - \frac{n}{2} \log{\left(2 \pi\sigma_p^2\right)} - \frac{ \sum_{i=1}^{n}{{\left(y_i - \hat y_i\right)}^2}}{2 \sigma_p^2}\\ \hat{\ell}_{0}= - \frac{n}{2} \log{\left(2 \pi\sigma_0^2\right)} - \frac{ \sum_{i=1}^{n}{{\left(y_i - \bar y\right)}^2}}{2 \sigma_0^2}\\ \hat{\ell}_{S}= - \frac{n}{2} \log{\left(2 \pi\sigma_S^2\right)} \end{cases}
If we assume $\sigma_p^2=\sigma_0^2=\sigma_S^2=\sigma^2$ is constant for the three models, we retrieve the usual definition of the $R^2$ (because the log-variance terms cancel out in the subtraction for each deviance term).
$$\begin{equation} \begin{aligned} D_{TOT} = \frac{1}{\sigma^2} \sum_{i=1}^n (y_i - \bar{y})^2 = \frac{1}{\sigma^2} \cdot SS_{TOT}, \\[6pt] D_{REG} = \frac{1}{\sigma^2} \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 = \frac{1}{\sigma^2} \cdot SS_{REG}, \\[6pt] D_{RES} = \frac{1}{\sigma^2} \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \frac{1}{\sigma^2} \cdot SS_{RES}. \\[6pt] \end{aligned} \end{equation}$$
As outlined above, the only way to make the variances vanish in the deviance is to assume they are the same, i.e., the models estimate expectations only, with the assumption that the variance is independent from that.
Can this assumption be justified however? Why should we assume the variance be the same? For the constant model, in realistic scenarios we are pretty much guaranteed to not find the same estimate for example. I understand in this class of model we are mostly interested in the location parameters, but I can see this might become a problem for other classes of models.
This can be very problematic when computing R-squared in new samples (for example, all the discussion around this question and other linked questions), and should appear in other distribution choices in GLMs, with single-parameter distributions (Bernoulli, Binomial with known n, Multinomial with known n, etc) being excluded.