Intuitively, it's because both $\beta$ and $\mu$ are responsible for the average scale of the volatility.
Say you define a new parameter $\mu' = \mu + \Delta$. Then, it follows that:
$$h_{t+1} + \Delta = \mu' + \phi((h_t + \Delta) - \mu') + \sigma_\eta \eta_t$$
Substituting in the other equation:
$$y_t = \beta e^{-\Delta/2} e^{\frac{h_t + \Delta}{2}} \varepsilon_t$$
If you then define $\beta' = \beta e^{-\Delta/2}$ and $h_t'=h_t + \Delta$, it follows that:
$$y_t = \beta' e^{h'_t/2} \varepsilon_t$$
$$h'_{t+1} = \mu' + \phi(h'_t - \mu') + \sigma_\eta \eta_t$$
This is the same model form that you started with, so you will get the same likelihood no matter the value of $\Delta$, if you adjust both parameters in this way by the same value of $\Delta$; this is the identifiability issue that was alluded to.
In theory it doesn't really matter which parameter you get rid of, but in the paper you linked, the specific application (MCMC) typically works better when parameters are less correlated in the posterior. In this case, it means you should get rid of $\beta$. This is demonstrated empirically in the paper.