If the full distribution $P(X, Y)$ is not computationally tractable, then we may choose to work with a simpler distribution $P(X)P(Y)$. In this case, $\text{KLD}(P(X, Y) || P(X)P(Y))$ will tell us how well the factored distribution approximates the full distribution we are actually interested in. If we are able to work with the full distribution of interest, there's usually no reason to see how well the full version approximates some simpler factored version.
However, the reverse direction does come up in variational Bayesian inference, where we want to learn a distribution over some number of unobserved variables, call them $Y$ and $Z$, but only observe $X$. Variational Bayes is used when the posterior of interest $P(Y, Z | X)$ is computationally intractable. (Typically it's intractable because we have many unobserved variables, rather than just the two $Y$ and $Z$). In order to formulate an optimization problem that is tractable, variational Bayes introduces an approximating distribution over unobserved variables $Q(Y, Z)$ by defining a lower bound on the log marginal probability of the observed data:
\begin{align}
\log P(x) & = \log \sum_{y, z} P(x, y, z) \\
& = \log \sum_{y, z} Q(y, z) \frac{P(x, y, z)}{Q(y, z)}\\
& \geq \sum_{y, z} Q(y, z) \log \left(\frac{P(x, y, z)}{Q(y, z)}\right) \\
& = E_Q\left[P(x, y, z)\right] - E_Q\left[Q(y, z)\right]
\end{align}
Where the inequality is due to Jensen's inequality. We can then learn $Q$ by maximizing this lower bound on the marginal:
\begin{align}
Q^*
& = \underset{Q}{\text{argmax}} E_Q\left[P(x, y, z)\right] - E_Q\left[Q(y, z)\right]
\end{align}
We can see that this is equivalent to minimizing $\text{KLD}(Q || P)$ by using negation to change the maximization problem into a minimization problem and adding $\log P(x)$ (which doesn't affect the optimum since $Q$ doesn't depend on $\log P(x)$):
\begin{align}
Q^* & = \underset{Q}{\text{argmin}}~ -\left(E_Q\left[P(x, y, z)\right] - E_Q\left[Q(y, z)\right]\right)\\
& = \underset{Q}{\text{argmin}}~
E_Q\left[Q(y, z)\right] - E_Q\left[P(x, y, z)\right] + \log P(x) \\
& = \underset{Q}{\text{argmin}}~
E_Q\left[Q(y, z)\right] - E_Q\left[\log P(y, z | x)\right]\\
& = \underset{Q}{\text{argmin}}~ \text{KLD}(Q(y, z) || P(y, z | x))
\end{align}
To make the learning problem tractable, $Q$ is typically taken to be from a constrained family of distributions. A popular choice is the "mean-field" approximation that severs all dependencies: $Q(y, z) = Q(y)Q(z)$. In this case, we are then finding $Q$ to minimize $\text{KLD}(Q(y)Q(z) || P(y, z | z))$.