The Bartlett kernel is the proportion of 'windows' of length $m$ covering a point $t$ that also cover $t+j$.
The basic idea in these estimators is actually in the part you left out. Suppose we want to estimate $$V=\frac{1}{T}\mathrm{var}[\sum_t (x_t-\mu)].$$
We know
$$E[V] = E[\frac{1}{T} (\sum_t (x_t-\mu))^2]= \frac{1}{T}\sum_{s,t} E[(x_t-\mu)(x_s-\mu)]$$
So what about
$$V_0 = \frac{1}{T} (\sum_t (x_t-\mu))^2= \frac{1}{T}\sum_{s,t} (x_t-\mu)(x_s-\mu)$$
That's a natural estimator. Well, actually it's not. It's not an estimator, because it depends on $\mu$, and it has high variance, and if you plug in $\hat\mu=\bar X$ it's a terrible estimator, because it's identically zero.
However, if you just used
$$V_1= \frac{1}{T}\sum_{s,t} I_{s=t}(x_t-\hat\mu)(x_s-\hat\mu) $$
it would be a good estimator if the $x_t$ were independent (it's the usual estimator up to $T-1$ vs $T$). There isn't much bias from using $\hat\mu$ instead of $\mu$.
On the other hand, if you used
$$V_1= \frac{1}{T}\sum_{s,t} I_{s=t}(x_t-\hat\mu)(x_s-\hat\mu)$$
in the more realistic setting when the series wasn't independent (or at least uncorrelated), you get bias from all the $(s,t)$ pairs that you left out.
So, we'll try
$$V_w= \frac{1}{T}\sum_{s,t} w_{st}(x_t-\hat\mu)(x_s-\hat\mu) $$
where the weight is something less dumb than either $w_{s,t}=1$ always or $w_{t,t}=1$ and $w_{s,t}=0$ otherwise
$V_w$ has two sorts of bias. It has bias from $w_{st}$ being too small, so it misses some of the true correlation. I'll call that truncation bias. It also has bias from $w_{st}$ being too big, so that the impact of substituting in $\hat\mu$ is big. I'll call that centering bias; it approximately depends just on the mean of $w_{s,t}$. We want to balance truncation bias and centering bias.
If we assume that the correlation between $x_t$ and $x_s$ tends to go down as $|s-t|$ increases, then truncation bias will be reduced for fixed centering by pushing all the weight $w_{s,t}$ towards small $|s-t|$. That is, you want $w_{s,t}=1$ for small $|s-t|$ and then a pretty quick switch to $w_{st}=0$ for larger $|s-t|$. That's also good from a mean squared error viewpoint.
Unfortunately, the estimator where $w_{st}=I_{|s-t|<m}$ need not give non-negative estimated variances, which is inconvenient. If you want non-negative estimated variances, you need $w_{st}$ to decrease more smoothly with $s-t$.
One way to make sure the estimator is non-negative is to write it as a sum of squared things. You can take
$$V_{\text{box}}=\frac{1}{T} \sum_t \frac{1}{2m+1}\left(\sum_{j=-m}^m x_{t+j} \right)^2$$
That's a quadratic in the $x_t$, so it's of the form
$$V_w= \frac{1}{T}\sum_{s,t} w_{st}E[(x_t-\hat\mu)(x_s-\hat\mu)] $$
and it's pretty clear that $w_{st}$ is decreasing with $|s-t|$ -- in fact,it's zero for $|s-t|>m$. If you did the algebra, which I'm not going to do, you'd find that the weights $w_{st}$ decrease linearly in $|s-t|$ until the reach zero. It's the Bartlett kernel!
So, the Bartlett kernel is a set of weights that gives pretty good variance estimation (consistent if $m$ increases at the right rate) and guarantees you don't get embarrassed by a negative variance. The cost is a little more bias and variability than the step-function kernels.
It's also possible to show that you can't do better; you can't get the truncation bias lower for a given average weight without risking negative variances.
This was all for $x_t-\mu$ but it generalises to everything else by the usual Taylor series arguments.