It allows the security of the construction to be reduced to second-preimage-resistance, rather than collision-resistance. This is a significant distinction, since brute force against collision-resistance is $2^{n/2}$ while brute force against second-preimage-resistance is $2^n$. In short, this XOR trick allows signatures/keys in the scheme to be half as long for the same level of security.
It is hard to express why the XOR trick helps the proof, without getting into the weeds. But here is a sketch of how to prove security of a Winternitz-like scheme.
The security reduction $\mathcal{S}$ plays the role of an adversary trying to break second-preimage resistance. It has to provide public keys and signatures to an adversary $\mathcal{A}$ trying to break unforgeability. In the event that $\mathcal{A}$ produces a signature forgery, $\mathcal{S}$ should break second-preimage resistance.
In plain Winternitz, we imagine a hash chain starting at $x_0$ and with $x_i = f(x_{i-1})$. The public key is $x_n$. The reduction $\mathcal{S}$ will guess an index $i^*$ such that $\mathcal{A}$ will ask for a signature value later than $i^*$ in the chain, but will generate a forgery consisting of a value at or before $i^*$ in the chain. If the guess is correct, then we want $\mathcal{S}$ to break second-preimage-resistance of $f$ somehow.
$\mathcal{S}$ is itself trying to break second-preimage-resistance. It will request a first-preimage $a$ and try to find a second preimage with $a' \ne a$ and $f(a) = f(a')$. It needs to incorporate $a$ into the hash chain somehow --- ideally, set $x_{i^*} = a$, so that if the adversary computes a preimage of $x_{i^*+1}$, it will be a second-preimage of $a$.
But if $x_{i^*}=a$, then what are $x_0, \ldots, x_{i^*-1}$ such that $x_i = f(x_{i-1})$? The reduction $\mathcal{S}$ can't make a consistent chain. It receives $a$ externally and would need to compute a preimage of $a$.
However, if we augment the scheme in the style of W-OTS+, we have additional $r_i$ values and $x_i = f(x_{i-1} \oplus r_{i-1})$. These $r_i$ values allow the reduction to "patch things together."
The reduction $\mathcal{S}$ chooses $x_0$ and computes a forward chain until $x_{i^*}$, as usual. Then it receives $a$ externally, and chooses $r_{i^*}$ so that $r_{i^*} \oplus x_{i^*} = a$. This means that the input into the next $f$ is $a$. This means that to make a forgery, $\mathcal{A}$ would have to find a preimage to go with $a$. From this point, the rest of the chain is computed normally.
So, the extra XOR just gives the reduction extra flexibility to insert a "trap" into the hash chain --- ensuring that a particular $a$ is used as input to $f$ in the chain, even when the reduction algorithm $\mathcal{S}$ didn't itself choose $a$. And this is exactly what's needed to reduce the signature security to second-preimage resistance.
Gravity-SPHINCS removes the XOR trick, thus relying on collision-resistance (see ia.cr/2017/933, sec. 3.4).
SPHINCS+ uses WOTS+ unchanged, thus relies on (second) preimage-resistance.
There are discussions on whether it makes it makes much of a difference in a quantum world to use one assumption or the other, and as far as I know there is no consensus yet.
– Thomas Prest Dec 10 '18 at 11:24By the way I'm marking the answer as valid today, just need some time to read :)
– peterwilli Dec 10 '18 at 13:01