As you've reasoned already, the errors themselves are unobservable. Whilst we do observe the regressand and the regressors, we can only recover the residuals, not the errors themselves. I think what your question is asking is how we should go about estimating a model for which one of the regressors is inherently unobservable. However, there are a couple of ways to go about estimating the $\theta$ parameter. One method mentioned by @EB3112 in the comments is to use ML estimation. However, to do this you first have to assume some distribution for the errors, usually Gaussian, or t-distributed if the tails of the errors are fatter. However, we can also get around the issue of estimating $\theta$ by using nonlinear least squares (NLS).
To do this, we first turn the MA(1) process into its AR($\infty$) representation. Consider the MA(1) model $$y_t=(1+\theta L)\epsilon_t$$ where $L$ is the lag, or left shift, operator. Under the invertibility condition that $|\theta|<1$, we can find the inverse of $(1+\theta L)$ which is defined as a sort of Taylor expansion viz. $$(1+\theta L)^{-1} := \sum^{\infty}_{k=0} (-\theta L)^k$$ so our process is $$(1+\theta L)^{-1}y_t = \epsilon_t$$ Now, using the definition of the inverse $$\sum^{\infty}_{k=0} (-\theta L)^k y_t = \epsilon_t$$ Thus, our AR($\infty$) representation is given by $$y_t=-\sum^{\infty}_{k=1}(-\theta L)^k y_t +\epsilon_t$$ This representation is useful since we assume the sequence $\{y_t\}$ is entirely observable. However, in reality it is impossible to have infinitely many observations before the current period. Therefore, we can truncate the representation, reasoning that since the process is stationary (since it is MA(1)) after some number of lags $j$ the coefficients in the AR($\infty$) representation will be insignificant. We can also see this more mechanically, since $|\theta|<1$ the coefficients geometrically decay as the number of lags increases.
Therefore, we can select a number of lags to use (say $y_1$ is the first observation and $y_T$ is the last), and minimise the residual sum of squares. So our NLS estimator for $\theta$ would be $$\hat{\theta}_{NLS} = \arg \min_{\theta} \left[\sum_{t=2}^T \left(y_t+\sum^{t-1}_{k=1}(-\theta L)^k y_t\right)^2\right]$$ Which we can solve using some numerical method. Note that we never assumed any distribution of the errors, in contrast with MLE.