When there is an assumption of a Gaussian-distributed error term (not a Gaussian distribution of all $y$ values, which is pretty much never important (common misconception that it is)), then minimizing mean squared error is equivalent to maximum likelihood estimation. However, mean squared error is a useful evaluation criterion when we are not willing to make such an assumption, too. For instance, minimizing mean squared error elicits conditional means. Mean squared error quite harshly penalizes predictions that are badly wrong in that, missing by $1$ results in a penalty of $1$, but doubling the miss to $2$ results in a penalty of $4$ rather than a penalty of $2$. This might not be desirable behavior, but it might be.
In summary, no, uconsidering mean squared error should not be contingent on a Gaussian assumption.
The $R^2$, depending on how you do the calculation, is just a function of the mean squared error.
$$
R^2=1-\left(\dfrac{
\overset{N}{\underset{i=1}{\sum}}\left(
y_i-\hat y_i
\right)^2
}{
\overset{N}{\underset{i=1}{\sum}}\left(
y_i-\bar y
\right)^2
}\right)
=1-\left(\dfrac{
N\times MSE}{
\overset{N}{\underset{i=1}{\sum}}\left(
y_i-\bar y
\right)^2
}\right)
$$
With that in mind, such an $R^2$ calculation does not require a Gaussian assumption, either. I discuss here how other $R^2$ calculations present problems.
Finally, to determine how good your model performance is, you have to give it some context. The $R^2$ calculation means that $R^2$ can be seen as a comparison of how your model performs to how a reasonable benchmark or baseline model performs. However, there is more to the story than just a comparison to a naïve model that always predicts $\bar y$, no matter the feature values, and I like that linked answer by mkt and consider it worthwhile reading.