Why not always use covariate instead of offset in Poisson Regression?

Question

I've just started studying Poisson regression and came across the two models:

$$ \begin{align*} \log{\mathbb{E}(count)} &= \beta_0 + \beta_1x_1 + \beta_2x_2 + \log(T) \\\\ \log{\mathbb{E}(count)} &= \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_T\log(T) \end{align*} $$ where T is time/exposure.

I'll start with the interpretation of a coefficient. Let's take $\beta_1$ for example. I'll do the interpretation both with respect to the count and the rate.

Offset model

When $x_1$ increases by one unit and $x_2$ stays constant, then the log count increases by $\beta_1$. Or, equivalently, the count is multiplied by $e^{\beta_1}$.
When $x_1$ increases by one unit and $x_2$ stays constant, then the log rate increases by $\beta_1$. Or, equivalently, the log rate is multiplied by $e^{\beta_1}$.

The interpretation is identical for both the mean and the rate of the Poisson distribution we are modeling.

Model with time as covariate

When $x_1$ increases by one unit, $x_2$ stays constant and $T$ stays constant, then the log count increases by $\beta_1$. Or, equivalently, the count is multiplied by $e^{\beta_1}$.
When $x_1$ increases by one unit, $x_2$ stays constant and $T$ stays constant, then the log rate increases by $\beta_1$. Or, equivalently, the log rate is multiplied by $e^{\beta_1}$.

The second interpretation is not so straightforward in this case but can be easily derived from the first. Keeping in mind that $rate = \frac{count}{T}$ and, based on the first interpretation, $T$ stays constant and count is multiplied by $e^{\beta_1}$, then the new rate is $rate_{new} = \frac{count_{new}}{T_{new}} = \frac{e^{\beta_1} count_{old}}{T_{old}} = e^{\beta_1} rate_{old}$

My points

The only difference between the two model interpretations is the bolded text.
The second model, not being restricted on the coefficient of $T$ being $1$, will be a better model.
Choosing between the two is sometimes a matter of whether you want to model rates or counts. However, you can get the rate by dividing count with $T$. So, using the second model is not an issue in that aspect.

Question

I get that the offset is used for easier interpretations. But upon exploring the above, the difference is not that great. Given that the second model will always give better results (I think?) why not always use the covariate time model? Is there another advantage to using offsets that I'm missing?

Note

People keep suggesting this question will help me In a Poisson model, what is the difference between using time as a covariate or an offset?. I read it before I made my question. I understand the difference between the two models. My question is different, I argue that offset model is inferior always. And even if my question is answered in the comments of that question (which is not, or at least not sufficiently for me to understand it) it would be helpful for other people to have this different question as a separate question so they can find it more easily and not search in comments of other questions.

This question has already been discussed here. See this thread and this one — Peter Flom, Dec 21 '23 at 21:16
@PeterFlom I have read those and they helped me understand the difference of the two. My point is that there is no need to use offset. I might be missing something of course. — John Katsantas, Dec 21 '23 at 21:20
If you use an offset, then that term is not multiplied by a parameter. If you use a covariate, it is. Sometimes you don't want it to be multiplied. Examples are given in threads. — Peter Flom, Dec 21 '23 at 21:22
@PeterFlom I checked the examples, I still have my question. Besides that, I think it's appropriate for my query to have its own question and answer here. Since it is a slightly different question. Anyone else looking for it would find it more easily as well.
Besides that, why would someone not want the term multiplied? You can still get the rate and the count. Can you tell me which example in the thread answers that? I can't find it. Thanks by the way. — John Katsantas, Dec 21 '23 at 21:45
I think there's a flaw in your argument. The second model should be written as $\log{E(count)} = \alpha_0 + \alpha_1 x_1 + \alpha_2x_2 + \alpha_T\log(T)$ as the $\beta_i$'s are not equivalent in the two models (unless in the unlikely event that the estimate of $\alpha_T$ happens to be exactly 1). — JimB, Dec 21 '23 at 23:01
@JimB Nice remark. Not sure how I should write this but I was referring to our two options for the model. Not that we've fitted both of them and they ended up having the same coefficients. It helped using the same symbols to compare the two interpretations. But yes, the two sets of coefficients will be different as the second model can fit the data better. — John Katsantas, Dec 21 '23 at 23:16
I'm a skeptical person so while it might be that you obtained similar coefficients for your data for both models, that would not be typical. And your argument would only have weight if that situation were typical. — JimB, Dec 21 '23 at 23:22
However, if one is considering having an offset with $T$ (i.e., using $\log T$ as the offset in glm in R, for example) AND having $\log T$ as a covariate, then by all means just include $\log T$ in the model and don't use the offset option. You could include $\log T$ AND the offset $\log T$ and end up with an identical model just with the $\log T$ coefficient differing by exactly 1 in the two models. — JimB, Dec 21 '23 at 23:27
Just to make it clear. The coefficients (and the interpretations) will not be the same. In fact the second model will fit the data better and give us better predictions. That's why I argue against using the offset. If I'm not mistaken offset is used because it allows for easier interpretations. I stress the fact that interpretations in the two models are just as easy. Or at least not so different to choose offset over the better second model. I'm not saying the interpretations will end up being the same. Given that, I'm trying to find a reason to use offset over the covariate option. — John Katsantas, Dec 21 '23 at 23:33
I'll think about this some more. But what you are doing is adding in another coefficient to estimate and that doesn't always result in a better model (at least if one subscribes to the AIC religion). And sometimes there is a "physical model" that requires just an offset with a coefficient of 1. — JimB, Dec 21 '23 at 23:37
Both models have a coefficient $\beta_T$. In the offset case, we just restrict it to having a value of 1. Relaxing this restriction will either give a better model or an equivalent one (with $\beta_T=1$ as well). Is there any chance the model will really end up being worse? — John Katsantas, Dec 21 '23 at 23:41
@JimB I'd really like to hear a "physical model" example that demands an offset if you have one. I think that's what I'm missing. — John Katsantas, Dec 21 '23 at 23:42
This question was in fact answered where Peter From linked; sometimes you want to preserve the unit of your denominator (e.g. time is always in the constant weeks, not some multiplicative factor). For that you use an offset, not a covariate. — PBulls, Dec 22 '23 at 05:52
@PBulls Not trying to be a pain here but I still don't see the issue. There is no one stopping you from calculating what you are interested in. The coefficient does not restrict us on this. I can give $T$ whatever value I want and predict the count. I can then divide the count by this $T$ if I want the rate. I would get what you are saying if I was somehow forced to use a specific $T$. But I'm not. If you could find the time to give me an example where the covariate model cannot give me what I want to predict, I would really appreciate it. — John Katsantas, Dec 22 '23 at 11:09
@User1865345 This was suggested earlier as well. Even though the question is different, there are some mentions in the comments about what I'm referring to but nothing that answered my question. — John Katsantas, Dec 22 '23 at 11:12
@JohnKatsantas: If you don't trust the homogeneity assumption implied by the offset, you could even go a step further and model log(Time) with more than a single parameter, e.g., with regression splines. — Michael M, Dec 22 '23 at 12:24