SUMMARY: What is the most appropriate zero-correlation model, depends on the data. There is no universally correct choice.
I will consider the same Machines data set. It has several Workers, each repeatedly tested on all of the three Machines. The maximal mixed model is thus
lmer(score ~ 1 + Machine + (0 + Machine | Worker), d)
which fits $3\times 3$ covariance matrix of the random effects. The fixed effects define the mean score for each Machine; there are three Machines so it is a three-dimensional vector $\mu$. On top of that each Worker $i$ deviates from this $\mu$ by some "random" three-dimensional vector $\mu_i$. These $\mu_i$ are random vectors with mean zero $(0,0,0)$ and some $3\times 3$ covariance matrix $\Sigma$. Such a covariance matrix has 6 parameters:
$$\Sigma=\begin{bmatrix}\sigma^2_A&\sigma^2_{AB} &\sigma^2_{AC}\\\sigma^2_{AB}&\sigma^2_B&\sigma^2_{BC}\\\sigma^2_{AC}&\sigma^2_{BC}&\sigma^2_C\end{bmatrix}.$$
Note that
lmer(score ~ 1 + Machine + (1 + Machine | Worker), d)
yields an equivalent model, only parameterized differently. The exact parametrization can also depend on the chosen contrasts, but I find it the easiest to discuss this with dummy contrasts, hence my (0 + Machine | Worker) specification above.
The crucial point here is that every model that simplifies the random effect structure can be understood as imposing some specific constraints on $\Sigma$.
The random intercept (1 | Worker) model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_w\end{bmatrix}.$$ Here each Worker gets a random scalar intercept $m_i$, i.e. $\mu_i = (m_i, m_i, m_i)$; the entries of $\mu_i$ are correlated with correlation 1.
The random interaction (1 | Worker:Machine) model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_{wm}&0&0\\0&\sigma^2_{wm}&0\\0&0&\sigma^2_{wm}\end{bmatrix}.$$ Here $\mu_i$ has three entries with the same variances but that are assumed to be uncorrelated.
In the following let A, B, and C be dummy variables for three Machines. Then (0 + A | Worker) model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_A&0&0\\0&0&0\\0&0&0\end{bmatrix}.$$ Here $\mu_i$ has only one non-zero entry with variance $\sigma^2_A$. Similarly for (0 + B | Worker) and (0 + C | Worker).
The second crucial thing to realize is that a sum of uncorrelated multivariate Gaussians with $\Sigma_1$ and $\Sigma_2$ has covariance matrix $\Sigma_1+\Sigma_2$. So to understand what happens with more complicated random structures we can simply add up covariance matrices written above.
For example,
lmer(score ~ 1 + Machine + (1 | Worker) + (1 | Worker:Machine), d)
fits a covariance matrix with 2 parameters (this form of the covariance matrix is known as "compound symmetry"):
$$\Sigma=\begin{bmatrix}\sigma^2_{wm}+\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_{wm}+\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_{wm}+\sigma^2_w\end{bmatrix}.$$
The model that Rune Christensen recommends for uncorrelated factors
lmer(score ~ 1 + Machine + (1 + A + B + C || Worker), d)
fits a model with 4 parameters that is a bit more general than compound symmetry (and is only 2 parameters away from the maximal model):
$$\Sigma=\begin{bmatrix}\sigma^2_A+\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_B+\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_C+\sigma^2_w\end{bmatrix}.$$
The model that you have "until recently" had in mind (your m2) is the model that Reinhold Kliegl recommends as the zero-correlation model:
lmer(score ~ 1 + Machine + (1 + c1 + c2 || Worker), d)
If c1 and c2 were produced using the default treatment contrasts (with A being the reference level), then this model can be written as
lmer(score ~ 1 + Machine + (1 + B + C || Worker), d)
I agree with Rune that it is a somewhat unreasonable model because it treats factor levels differently: B and C get their own variance but A does not (corresponding $\Sigma$ would look the same as the one above but without $\sigma^2_A$). Whereas all three machines should arguably be treated on the same footing.
Thus, the most reasonable sequence of nested models seems to be:
max model --> comp symmetry w/ unequal vars --> comp symmetry --> rand. intercept
A note on marginal distributions
This post was inspired by Rune Christensen's email here https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026847.html. He talks about $9\times 9$ marginal covariance matrices for individual observations within each Worker. I find this more difficult to think about, compared to my presentation above. The covariance matrix from Rune's email can be obtained from any $\Sigma$ as $$\Sigma_\text{marginal} = I_{m\times m} \otimes \Sigma + \sigma^2 I,$$ where $m$ is the number of repetitions per Worker/Machine combination (in this dataset $m=3$) and $\sigma^2$ is residual variance.
A note on sum contrasts
@statmerkur is asking about sum contrasts. Indeed, it is often recommended to use sum contrasts (contr.sum), especially when there are interactions in the model. I feel that this does not affect anything that I wrote above. E.g. the maximal model will still fit an unconstrained $\Sigma$, but the interpretation of its entries is going to be different (variances and covariances of the grand mean and deviations of A and B from the grand mean). The $\Sigma$ in m2 defined using contr.sum will have the same form as in (1+A+B || Worker) above, but again, with the different interpretation of the entries. Two further comments are:
Rune's critique of m2 still applies: this random effect structure does not treat A, B, and C on the same footing;
The recommendation to use the sum contrasts makes sense for the fixed effects (in the presence of interactions). I don't see a reason to necessarily prefer sum contrasts for the random effects, so I think, if one wants to, one can safely use (1+A+B+C || Worker) even if the fixed part uses sum contrasts.
A note on custom contrasts
I had an email exchange with Reinhold Kliegl about this answer. Reinhold says that in his applied work he prefers (1+c1+c2 || subject) over (1+A+B+C || subject) because he chose c1 and c2 as some meaningful contrasts. He wants to be able to interpret $\Sigma$ and he wants its entries to correspond to c1 and c2.
This basically means that Reinhold is fine with rejecting the assumption (that I made above) that the factor levels should be treated equally. He does not care about individual factor levels at all! If so, then of course it is fine to use (1+c1+c2 || subject). He gives his paper https://www.frontiersin.org/articles/10.3389/fpsyg.2010.00238/full as an example. There a four-level factor is coded with 3 custom contrasts c1, c2, c3, and grand mean as the intercept. These specific contrasts are of interest, and not the individual factors A to D. In this situation I agree that (1+c1+c2+c3 || subject) makes total sense.
But one should be clear that while (1+c1+c2+c3 | subject) does treat factor levels A to D equally (and merely re-parametrizes $\Sigma$ in terms of particular contrasts), (1+c1+c2+c3 || subject) will fail to treat factor levels equally.
m2is only equivalent to(1 + B + C || Worker)forcontr.treatment. One has to use(c1 + c2 || group)for the general form. Can you give the covariance matrix form2withcontr.treatmentand withcontr.sum? – statmerkur May 28 '18 at 08:14m1for(Machine | Worker)withcontr.treatmentandcontr.sum. – statmerkur May 28 '18 at 08:19contr.sum, then your random effects do not correspond to A, B, and C anymore. You still have 3x3 covariance matrices, but the rows/columns correspond to the grand mean M, and the deviation of A from M and the deviation of B from M (I think that's howcontr.sumworks; right?). I am not sure what you want to see form1, it's the still a generic 3x3 cov matrix. The same is true form2: it would have the same form as with treatment contrasts, just the interpretation of the entries is different. I think the critique remains thatm2treats A, B, and C differently. – amoeba May 28 '18 at 08:35contr.sumto be more explicit about it. – amoeba May 28 '18 at 10:01(1+A+B+C|Worker)can still be thought as random deviations from these means. It might be a bit confusing when the fixed part and the random part use different contrasts (so to say), but it's fine. – amoeba May 28 '18 at 11:00m2forcontr.treatmentlook like? Like the one form3except for entry (1,1) which would be just sigma_omega^2 ? – statmerkur May 28 '18 at 11:46(1 | Worker) + (1 | Worker:Machine)compound symmetry. I see why you do that, but a model with(1 | Worker)is usually said to be "compound symmetry" because the Var-Cov matrix of $Y$ (marginal distribution) has a compound symmetry form (and can be fitted as such with GLS). But it is of course not easy to see how the residual variance enters the mix unless we look at $\mathrm{Cov}(Y)$ ;-) – Rune H Christensen May 28 '18 at 18:40(1 | Worker)is somewhat non-standard. I think most people would think of a scalar random effect with (scalar) variance $\sigma_w^2$ for this model. But @amoeba's representation is mathematically equivalent and offers a fresh perspective - thanks. – Rune H Christensen May 28 '18 at 18:41m2a(for a different data situation) in whichc1andc2was coded with $\pm$ 0.5 which makes the random-effect design matrix $Z$ no longer be an indicator matrix. This is what I cannot but think will seriously alter the variance-covariance structure of the model, and this is why I can't make sense of it. Models in whichc1andc2are indicator contrasts are not as perplexing to me though I don't claim to fully understand those either. – Rune H Christensen May 29 '18 at 13:58contr.treatmentthere) https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026769.html. And ifc1andc2are custom contrasts this will alter the variance-covariance structure of the model, too (I think). So what do you think about @amoeba's "note on sum contrasts" and "note on custom contrasts" – statmerkur May 30 '18 at 22:10m2withcontr.sumand wasn't able to figure out why you say that it looks like the one for(1+A+B || Worker). Would be great if you could elaborate on this. See next comment for the $\Sigma$ he derived – statmerkur Oct 14 '19 at 15:48