I'll offer an explanation based on marginal vs joint probability distributions. First, let's introduce some notation.
Let $Y_1,\ldots,Y_n$ be the random sample, with $Y_i$ the $i$th (random) observation vector, with $Y_i = (Y_{i1},\ldots,Y_{iT})$; suppose also that we have some observed covariates $x_i$. Thus the observable data are $(Y_i,x_i)$, for $i=1,\ldots,n$.
Typically, but not always, the values in $Y_i$ are repeated measures taken in time or space, pertaining to the same individual $i$. Thus, towards building the likelihood function, while we may assume independence between $Y_i,Y_j$, we cannot assume independence across $Y_i$.
Here is where the random effects come in. We assume that for each of the individuals we have an unobserved random effect $c_i$; where unobserved means that we cannot measure it and, random means that it is a random variable. Since all individuals are independent, we assume that $c_i$ are also independent and that $c_i\sim N(0,\sigma_c^2)$; without loss of generality, we can assume $\sigma_c^2$ is known.
Therefore the complete data for individual $i$ is $(Y_i, x_i,c_i)$ and thus the joint probability distribution is
$$f(y_i, c_i;x_i,\beta) = f(y_i|c_i;x_i,\beta_i)\phi(c_i).$$
If we were given a value of $c_i$, we could use this probability distribution to estimate $\beta$. It's clear that no one will ever give us such a $c_i$ so we have to come out with a better strategy.
The common$^1$ approach is work unconditionally or marginally with respect to $c_i$. That is, the idea is to take the likelihood
$$\bar{f}(y_i;x_i,\beta) = \int f(y_i|c_i;x_i,\beta_i)\phi(c_i)\,dc_{i}.$$
This would guarantee that the estimator for $\beta$ will not depend on the particular $c_i$ although it will depend on the distribution of $c_i$, thus we are taking care of the dependence across $Y_i$.
To sum up, the random effects are just uninteresting random data that we cannot measure. To deal with this missingness issue, the most reasonable choice is to eliminate, i.e. integrate, them from the joint distribution.
$^1$ There is another frequentist approach called Hierarchical Generalized Linear Models in which, essentially, $c_i$ is treated as a parameter and is estimated jointly with $\beta$, using the so-called $h$-likelihood.