Probably the most common one in machine learning usage is the mean map kernel (the kernel induced by the maximum mean discrepancy [MMD] distance). Here we define an auxiliary kernel $\kappa$, and then $$k(S, T) = \frac{1}{mn} \sum_{i=1}^m \sum_{i=1}^n \kappa(s_i, t_j).$$
This essentially makes the assumption that $S$ and $T$ are iid samples from some distributions $P$ and $Q$, and estimates a distance between $P$ and $Q$. If you use a Gaussian RBF kernel of bandwidth $\sigma$, the mean map embedding distance converges to a multiple of the $L_2$ distance as $\sigma \to 0$, but you'd typically use some fixed larger $\sigma$, in which case your statistical estimation properties are better.
You can see an overview of many such kernels and estimators of them in, uh, my PhD thesis from last year.