It depends on what is meant by $R^2$. In simple settings, multiple definitions give equal values.
Squared correlation between the feature and outcome, $(\text{corr}(x,y))^2$, at least for simple linear regression with just one feature
Squared correlation between the true and predicted outcomes, $(\text{corr}(y,\hat y))^2$
A comparison of model performance, in terms of square loss (sum or squares errors), to the performance of a model that predicts $\bar y$ every time
The proportion of variance in $y$ that is explained by the regression
In more complicated settings, these are not all equal. Thus, it is not clear what constitutes the calculation of $R^2$ in such a situation.
I would say that #1 does not make sense unless we are interested in a linear model between two variables. However, that leaves the second option as viable. Unfortunately, this correlation need not have much to do with how close the predictions are to the true values. For instance, whether you predict the exactly correct values or always predict high (or low) by the same amount, this correlation will be perfect, such as $y = (1,2,3)$ yet $\hat y = (101, 102, 103)$. That such egregiously poor performance can be missed by this statistic makes it of questionable utility for model evaluation (though it might be useful to flag a model as having some kind of systemic bias that can be corrected). When we use a linear model fit with OLS (and use an intercept), such in-sample predictions cannot happen. When we deviate from such a setting, all bets are off.
However, Minitab appears to take the stance that $R^2$ is calculated according to idea #3.
$$
R^2=1-\left(\dfrac{
\overset{N}{\underset{i=1}{\sum}}\left(
y_i-\hat y_i
\right)^2
}{
\overset{N}{\underset{i=1}{\sum}}\left(
y_i-\bar y
\right)^2
}\right)
$$
(This could be argued to be the Efron pseudo $R^2$ mentioned in the comments.)
This means that Minitab takes the stance, with which I agree, that $R^2$ is a function of the sum of squared errors, which is a typical optimization criterion for fitting the parameters of a nonlinear regression. Consequently, any criticism of $R^2$ is also a criticism of SSE, MSE, and RMSE.
I totally disagree with the following Minitab comment.
As you can see, the underlying assumptions for R-squared aren’t true for nonlinear regression.
I assumed nothing to give the above formula except that we are interested in estimating conditional means and use square loss to measure the pain of missing. You can go through the decomposition of the total sum of squares (denominator) to give the "proportion of variance explained" interpretation in the linear OLS setting (with an intercept), sure, but you do not have to.
Consequently, I totally disagree with Minitab on this.
X = c(1,2,3,4,5,6); Y = c(4,5,6,8,7,9), and the model,model1 = lm(Y ~ X),cor(Y, predict(model1))^2= 0.889. But if we use, say, a linear model without the intercept,model2 = lm(Y ~ X + 0), the correlation between y and y-hat is still 0.889, even though the fit is relatively poor. On the other hand, pseudo r-squares, a la # 3 in @Dave 's answer, are 0.889 and 0.214. – Sal Mangiafico Mar 31 '23 at 23:36