You could of course do that if you chose to, so I assume you are asking why Holt did not choose that formula.
Suppose that the series $y_t$ actually satisfies a linear trend model, say $$y_t = a_0 + a_1 t + \epsilon_t$$ where $\epsilon_t$ is some i.i.d. random noise (mean 0, constant variance). Let's also assume that we are far enough along in applying whichever smoothing formulas we chose so that $\ell_{\tau} \approx a_0 + a_1 \tau$ and $b_{\tau}\approx a_1.$ According to the first Holt formula,
\begin{align*}
\ell_{t} & =\alpha(a_{0}+a_{1}t+\epsilon_{t})+(1-\alpha)(\ell_{t-1}+b_{t-1})\\
& \approx\alpha(a_{0}+a_{1}t+\epsilon_{t})+(1-\alpha)(a_{0}+a_{1}[t-1]+a_{1})\\
& =a_{0}+a_{1}t+\alpha\epsilon_{t}.
\end{align*}
Using Holt's second formula as stated, we get
\begin{align*}
b_{t} & =\beta(\ell_{t}-\ell_{t-1})+(1-\beta)b_{t-1}\\
& \approx\beta(a_{0}+a_{1}t+\alpha\epsilon_{t}-[a_{0}+a_{1}(t-1)])+(1-\beta)a_{1}\\
& =a_{1}+\beta\alpha\epsilon_{t},
\end{align*}
whereas with your second formula we get
\begin{align*}
b_{t} & =\beta(y_{t}-y_{t-1})+(1-\beta)b_{t-1}\\
& \approx\beta(a_{1}+\epsilon_{t}-\epsilon_{t-1})+(1-\beta)a_{1}\\
& =a_{1}+\beta(\epsilon_{t}-\epsilon_{t-1}).
\end{align*}
Both estimates of slope are unbiased, but yours contains more noise.
Addendum: That was a bit hand-wavy even by my lax standards, so let me try to make it slightly more rigorous. Let $\lambda_t$ and $\eta_t$ be the errors in $\ell_t$ and $b_t$ respectively, i.e., $$\ell_t = a_0 + a_1 t +\lambda_t$$ and $$b_t = a_1 + \eta_t.$$ The Holt formula for $b_t$ reduces to $$b_t = a_1 + \beta(\lambda_t - \lambda_{t-1}) + (1-\beta)\eta_t$$and the proposed alternative reduces to $$b_t = a_1 + \beta(\epsilon_t - \epsilon_{t-1}) + (1-\beta)\eta_t.$$ Assuming that the exponential smoothing is actually smoothing things, we expect $\lambda_t$ to have lower variance than $\epsilon_t.$ I'm pretty sure you can prove that via an induction argument.