Let there be $n$ observations and $p$ features (so $\beta_0$ is the intercept). Thus, your linear predictions are $p_i = \hat\beta_0 + \hat\beta_1 x_{i, 1} +\cdots + \hat\beta_p x_{i, p}$.
A natural choice is to use the optimization criterion for calculating the regression parameters. Like other generalized linear models, Poisson regression uses maximum likelihood estimation of the parameters, meaning that the estimates $\hat\beta$ is the $\hat\beta$ that maximizes $
\underset{i=1}{\overset{n}{\sum}}\left(
y_ip_i
-
e^{
p_i}
\right)
$. (You can refer to the Wikipedia derivation for the details.)
This is (related to, see the derivation) the log-likelihood of the Poisson distribution.
Consequently, it is totally reasonable to talk about how high of a likelihood your model has.
However, there is no context for what constitutes a good or even tolerable likelihood. $R^2$-style metrics that are limited to $(0, 1)$ or $(-\infty, 1)$ kind of achieve that. If you consider the usual $R^2$, it is a comparison of the square loss achieved by your model compared to a naïve model that always predicts $\bar y$ every time. This makes some sense. Since the regression aims to predict the conditional expected value, what better baseline model that you must beat than a model that always predicts the overall mean $\bar y?$
$$
R^2 = 1-\dfrac{
\underset{i=1}{\overset{n}{\sum}}\left(
y_i -\hat y_i
\right)^2
}{
\underset{i=1}{\overset{n}{\sum}}\left(
y_i -\bar y
\right)^2
}\\=
1-\dfrac{
\text{Square loss of your model}
}{
\text{Square loss of the baseline model}
}
$$
A way to view the fraction is a comparison between the sum of squared residuals achieved by your model compared to the sum of squared residuals achieved by the baseline model. This sum of squared residuals is the loss function, often called "square loss".
In Poisson regression, we do not aim to minimize the square loss. We aim to maximize the Poisson log-ikelihood, equivalent to minimizing the negative Poisson likelihood, $
\left(-\underset{i=1}{\overset{n}{\sum}}\left(
y_ip_i
-
e^{
p_i}
\right)\right)
$.
Consequetly, I say to apply the same idea. You have your model that achieves some Poisson loss. You can find the Poisson loss of a model that always predicts the overall mean $\bar y$ by calculating $\left(-\underset{i=1}{\overset{n}{\sum}}\left(
\bar y y_i
-
e^{
\bar y}
\right)\right)
$. Put them in an $R^2$-style equation.
$$
R^2 = 1-\dfrac{
\left(-\underset{i=1}{\overset{n}{\sum}}\left(
y_ip_i
-
e^{
p_i}
\right)\right)
}{
\left(-\underset{i=1}{\overset{n}{\sum}}\left(
\bar y y_i
-
e^{
\bar y}
\right)\right)
}\\=
1-\dfrac{
\text{Poisson loss of your model}
}{
\text{Poisson loss of the baseline model}
}
$$
In fact, this is how McFadden's pseudo $R^2$ works for binomial models (e.g., logistic regression), but with binomial likelihoods instead of Poisson, so there is precident for extending the conventional $R^2$ in this way.
An advantge of this is that it arises as a natural generalization of a popular and understood technique, $R^2$. Drawbacks include the relative obscurity (harder to explain to bosses/customers) and perhaps a lack of software implementation.
One property held by the usual $R^2$ that the above, annoyingly, will not satisfy, is being $1$ when predictions are perfect. For that reason, we might choose to subtract out the value achieved by perfect predictions, both in the numerator and denominator. Subtracting out the loss value achieved by a model that makes perfect predictions does not change the parameter values, so the model is equivalent and can be seen as a valid loss function and valid statistic to use in the numerator (for you model) and the denominator (for the baseline model).
On the other hand, there are ways of deriving an $R^2$-style metric using deviance statistics that are more general than the familiar sum of squares in the usual $R^2$. There is appeal to this, and you might like the idea of giving the "proportion of deviance explained". Disadvantages include the fact that people are less likely to have a technical understanding of deviance like they (at least think they) have when it comes to variance, and the lack of comparison to a baseline model (which I find extremely intuitive). A further disadvantage is that I suspect Ben's linked explanation to rely on the linear aspect of the GLM and on maximum likelihood estimation of the parameters. If his deviance $R^2$ behaves like regular $R^2$ in the linear case, the "proportion of deviance explained" interpretation will be invalid, much as "proportion of variance explained" is only valid for $R^2$ under particular conditions like linear models and estimation via least squares. Out-of-sample testing could be funky, too, and I have a strong opinion about $R^2$ and $R^2$-style statistics when they are applied to data on which the model was not trained.
However, depending on what you value, you might be interested in just the usual $R^2$. If you want to measure how you do in terms of squared residuals, the usual $R^2$ could be an excellent option. Indeed, getting back to binomial models, the usual $R^2$ is a viable metric. Your reference does list some drawbacks of the usual $R^2$, yes, but they might be okay. First, $R^2$ being interpreted as the proportion of variance explained is the exception, not the rule, and you don't even need nonlinearity to wreck that interpretation, depending on how you estimate the coefficients (1)(2). Second, while it is true that the usual $R^2$ is not bounded below by $0$ when you estimate the coefficients through a method other than least squares (Poisson regression minimizes Poisson loss, not square loss), a value below $0$ signals to you that your model is outperformed in terms of square loss by the naïve baseline model that always predicts $\bar y$. It strikes me as a feature, not a bug, for a statistic to flag cases where your performance is poor.
For all of these, unfortunately, there is no objective measure of what constitutes a good model. While it is true that a value less than zero indicates performance worse than your baseline, which is reasonably considered a "must beat" level of performance, how high constitutes good performance depends on the problem at hand.
Hopefully discussing these will guide you to choosing an alternative, should none of them suit your needs.