I will answer on the basis of one particular case of MDPs. There have been a few recent articles that extensively discuss the issues of the diversification of MDPs and introduce a new model known as the MDP-DIV (Markov Decision Process for Diversifying). For that I refer you to Xia et al. (2017).
It is identified that the main causes for slow convergence are data scarcity/sparse transition matrices as you have pointed out, and a large action space.
Two new approaches to accelerate the convergence rate are detailed in Liu et al. (2018). Here, I will quote the relevant parts of the abstract.
The MDP-DIV-kNN method adopts a
$k$ nearest neighbor strategy, i.e.,
discarding the
$k$ nearest neighbors of the recently-selected action (document), to reduce the diversification searching space.
The MDP-DIVNTN employs a pre-trained diversification neural tensor network (NTNDIV) as the evaluation model, and combines the results with MDP to
produce the final ranking solution.
The experiment results demonstrate
that the two proposed methods indeed accelerate the convergence rate of
the MDP-DIV, which is 3x faster, while the accuracies produced barely
degrade, or even are better.
References
[1] Xia, L., Xu, J., Lan, Y., Guo, J., Zeng, W., Cheng, X. (2017). Adapting markov decision
process for search result diversification. pp. 535–544. SIGIR ’17, ACM, New York,
NY, USA.
[2] Liu, F., Tang, R., Li, X., Ye, Y., Guo, H., He, X. (2018). Novel Approaches to Accelerating the
Convergence Rate of Markov Decision Process
for Search Result Diversification. https://arxiv.org/pdf/1802.08401.pdf.