Have you ever seen anyone mention probability density function as a framework for neural network?

Question

I am evaluating a proposal pitch from a vendor about their machine learning solution. I do not have access to the source code or any other technical details about the algorithm they are using as it is their IP. But in the only technical side of PPT slides, they mention that they are using self learning neural network (actual text is given below). Initially this sounded like reinforcement learning to me. But in the very next line they mention the framework behind the solution and the big surprise to me was that they have mentioned 1) Probability Density Function and 2) Clustering. In the machine learning world, these two are not frameworks and this immediatedly makes me suspicious about the caliber of the vendor. Honestly I was expecting a deep learning framework like Pytorch or Tensorflow but PDF as frameworks is a big redflag for me about the technical knowledge of the vendor. However, I could be wrong. So my question is:

Question: What would be your perception or first impression of a vendor who puts probability density function as the core a framework for something they claim to be a self-learning neural network?

Text from PPT slide

We use self-learning, to provide employers with the best matching candidates to their open jobs. The AI is built on smart neural networks that learn from the behaviors and activities of users as well as employers' activities. The below are some of the frameworks used in the backend in providing informative hiring decisions for recruiters:

Probability Density Functions – the use of normal probability distributions, such as Gaussian distribution, which will help better analyze the users’ data in the system which will eventually cater to the needs of each employer based on their history, peer vector analysis, scoring modules and other technical information to provide the best matching talent. This would help employers in their pipeline building as well as in hiring their needs efficiently and in a timely manner.

I have no way of knowing if this is what the vendor is doing, but one place PDFs may appear in neural networks is in constructing loss functions. See: https://stats.stackexchange.com/q/378274/22311 — Sycorax, Jul 19 '23 at 08:05
@Sycorax-OnStrike May be, but even if it were the case, it would be way too cryptic; I mean why mention it at all as your key framework especially when PDF is basically entry level statistics 101 is somethign I fail to understand — Stats IT, Jul 19 '23 at 08:16
Yeah, agree that although PDF is indeed related to neural network, it is just too elementary to mention. — lpounng, Jul 19 '23 at 08:32
Regarding the edit, it's impossible to know definitively what they mean without more information. However, the slides were likely written by non-experts like sales & marketing folks. In particular, the phrase "the use of normal probability distributions, such as Gaussian distribution," indicates to me that this was not written by someone with even a stats 101-level understanding because a Gaussian distribution and a normal distribution are the same thing. — Sycorax, Jul 19 '23 at 08:39
The pitch is likely made by marketing people - not by statisticians or machine learning experts. However, if the latter are the target audience, it is a serious failure... if the audience are non-experts, than mentioning probabilities and clustering is merely intended to demonstrate that there is some serious theory behind what they do (at least a real statistician or a computer scientist, rather than simply using sklearn.) Welcome to the real world! ;) — Roger V., Jul 20 '23 at 09:38

score 12 · Accepted Answer · answered Jul 19 '23 at 08:41

This is buzzwordery at its worst. How is a Gaussian used? Why a Gaussian and not some other distribution?

Then again, their pitch deck almost certainly was created not by the people with the actual statistical knowledge*, but by marketing and/or product management, and while PM typically has a little more technical understanding than marketing, that is emphatically not the same as statistical understanding. Their statistics may be absolutely fine, but have been garbled in a game of telephone in the production of these pitch slides. For all we know, the original incarnation of the slides may have been completely correct statistically, but it may have been "improved" by various layers in the bureaucracy for better "understanding" in the target group - which usually does not include statisticians.

Solution: ask. Ask the vendor for a call specifically with the statistical expert, and have a technical discussion with them.

*Source: long and painful experience in a large company creating a product with core statistical functionality, which is then sold to customers with possibly less statistical understanding.

Tim · Answer 2 · 2023-07-19T09:31:20.413

I would read the text provided on the slide as follows:

Either they use some kind of probabilistic model. Such models are commonly used, there is nothing strange about it.
Alternatively, they use fancy language to say that they are using squared error to fit the model. This would be equivalent to using maximum likelihood with a Gaussian distribution, so "you can think of" the results as of fitting Gaussians to the data.

The description is however too vague to tell what it is exactly and definitely is not written by an expert in this area. From the description you simply cannot tell what is the model and if it makes sense or not.

Answering the question from your title, yes, it is pretty common to interpret neural networks in probabilistic terms. For example, Goodfellow et al (2016) in the classic handbook on deep learning says

Given features $h$, a layer of linear output units produces a vector $\hat y = W^\top h + b$.

Linear output layers are often used to produce the mean of a conditional Gaussian distribution:

$$ p(y | X) = \mathcal{N}(y; \hat y, I) \tag{6.17} $$

Maximizing the log-likelihood is then equivalent to minimizing the mean squared error. [...]

score 0 · Answer 3 · answered Dec 09 '23 at 05:56

I believe their method is to have a seperate encoder for user and item information. Each encoder outputs the mean and variance of a gaussian in an embedding space ("probability density function framework"). Then, they use wasserstein distance between users and items as a measure of similarity. Finally, they use some sort of representation learning loss formulation such as InfoNCE to fit the model ("self-learning"?)

Sounds similar to Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation (manuscript).

Have you ever seen anyone mention probability density function as a framework for neural network?

3 Answers3