I am wondering why in this work https://github.com/facebookresearch/dlrm, authors are calculating dot product of embeddings by its transpose. Here is the sentence from their paper, the last paragraph in the 3rd page.
We will compute second-order interaction of different features explicitly, following the intuition for handling sparse data provided in FMs (factorization machine), optionally passing them through MLPs. This is done by taking the dot product between all pairs of embedding vectors and processed dense features.
Here is a picture from the architecture of their model in their github repository:
I can not understand the intuition for dot product. Why does the dot product computes second-order interaction? I studied factorization machine method but I could not understand the intuition. Can anyone give me some sources to study and understand? or clear it out for me?
