Examples of kernels between sets of vectors

Question

Can anyone give some examples of widely-used kernels between sets of vectors?

A kernel between sets of vectors is a positive semidefinite map $k:(S,T)\mapsto k(S,T)\in \mathbb{R}$ where $S$ and $T$ are sets of vectors (i.e., $S=\{s_1,s_2,\ldots,s_m\}$, $T=\{t_1,t_2,\ldots,t_n\}$, and $s_i,t_j\in \mathbb{R}^d$.

Thank you!

There are quite a few in the Gaussian process for Machine Learning book. Probably the most common is the squared exponential kernel, where you give a positive definite matrix $\Sigma$ and the kernel is $\kappa(x,y; \Sigma) = \exp\left(x^t \Sigma y\right)$. Using this kernel is akin to assuming a Gaussian relationship between the components of $x_i$. Another excellent reference is the kernel cookbook, which has a good set of visualizations to go with it. — combo, Jan 24 '17 at 21:55
@StefanJorgensen I think you missed that the inputs to the kernel are supposed to be sets of vectors, not single vectors. — Danica, Jan 24 '17 at 22:10
Indeed. But is it fundamentally different? The collection of vectors $S = {s_1,\dots,s_m}$ is just a matrix, which one could vectorize and use the standard kernels for, right? — combo, Jan 25 '17 at 16:22
@StefanJorgensen Sure...if you're willing to only support sets of a single fixed size, and to require factorially times as many inputs for the learning method to pick up on the set's permutation invariance. — Danica, Jan 30 '17 at 23:47

Danica · Accepted Answer · 2017-01-25T12:40:02.847

Probably the most common one in machine learning usage is the mean map kernel (the kernel induced by the maximum mean discrepancy [MMD] distance). Here we define an auxiliary kernel $\kappa$, and then $$k(S, T) = \frac{1}{mn} \sum_{i=1}^m \sum_{i=1}^n \kappa(s_i, t_j).$$

This essentially makes the assumption that $S$ and $T$ are iid samples from some distributions $P$ and $Q$, and estimates a distance between $P$ and $Q$. If you use a Gaussian RBF kernel of bandwidth $\sigma$, the mean map embedding distance converges to a multiple of the $L_2$ distance as $\sigma \to 0$, but you'd typically use some fixed larger $\sigma$, in which case your statistical estimation properties are better.

You can see an overview of many such kernels and estimators of them in, uh, my PhD thesis from last year.

Examples of kernels between sets of vectors

1 Answers1

Linked