It's not a silly question --- this is a common misconception in the recursive method, and I've seen many people make the same mistake. The problem here is that your "and so on" glosses over the fact that the recursive application of that substitution yields a limiting term at the end. To see this, first apply the recursive substitution $N$ times to get the equation:
$$\begin{align}
x_t
&= \epsilon_t + \theta \epsilon_{t-1} \\[6pt]
&= \epsilon_t + \theta (x_{t -1}- \theta \epsilon_{t-2}) \\[6pt]
&= \epsilon_t + \theta x_{t -1}- \theta^2 (x_t- \theta\epsilon_{t-3})\\[6pt]
&\ \ \vdots \\[6pt]
&= \epsilon_t - \sum\limits_{j=1}^{N-1} (-\theta)^jx_{t-j} - (-\theta)^{N} \epsilon_{t-N}. \\[6pt]
\end{align}$$
Taking the limit as $N \rightarrow \infty$ then gives the proper limiting equation, which includes an extra term that you left out:$^\dagger$
$$\begin{align}
x_t
&= \lim_{N \rightarrow \infty} \bigg[ \epsilon_t - \sum\limits_{j=1}^{N-1} (-\theta)^jx_{t-j} - (-\theta)^{N} \epsilon_{t-N} \bigg] \\[6pt]
&=\epsilon_t - \sum\limits_{j=1}^{\infty}(-\theta)^jx_{t-j} - \underbrace{\lim_{j \rightarrow \infty} (-\theta)^j \epsilon_{t-j}}_\text{You left this out}.
\end{align}$$
In order to get the equation you want, you need that last term to disappear (i.e., converge stochastically to zero), and one way to do that is to have $|\theta|<1$ and a bounded variance on the series of error terms (a fixed finite error variance is sufficient here). If $|\theta| = 1$ then that final limiting term fails to disappear and if $|\theta| > 1$ the final limiting term explodes.
(Incidentally, this case gives a broader lesson about good practice for taking limits of recursions. When you use these types of equations, it is good practice to first write out the equation for an arbitrary finite number of applications of the recursion, and then take limits at the end. If you fail to do this it may lead you to omit an important term or otherwise misunderstand the proper limiting equation. As noted, in this particular case I've seen many people make exactly the mistake you did, because they gloss over these intermediate steps.)
$^\dagger$ Incidentally, the same general reasoning occurs if you use the lag operator $L$ in operator theory. If you would like to read about the invertability properties of the lag operator, I recommend Kasparis (2016) as a good introduction. Although this area is a bit complicated, you can think of the lag operator as obeying the following heuristic equation:
$$\frac{1}{1 - \theta L} = \sum_{i=1}^\infty (\theta L)^i + \lim_{N \rightarrow \infty} (\theta L)^N.$$
The last term in the operator expansion maps to zero if $|\theta| < 1$ and if it is applied to a sequence of random variable with bounded variance. (If not then the term can explode and you can get indeterminate forms from this equation, which is why I call it a heuristic equation.) The operator method is a bit more complicated, but roughly speaking, the lag operator acts like the real number in respect to this type of equation.