Machine Learning alternative for hashing

Question

Is there a Machine Learning technique that can used to detect the slightest change in data? I know this can be done using a hash but I was just wondering if there is any machine learning technique out there that can do this as well or close to it.

What do you mean by change? Do you mean vector data that is similar with respect to some distance? Have you heard of locality sensitive hashing? — Jakub Bartczuk, Sep 03 '19 at 13:44
What problem are you trying to solve? Why would you prefer to use ML for this instead of a hash? What added value would an ML method bring to the table? — Sycorax, Sep 03 '19 at 21:27
@Sycorax it was just a thought that came to mind. I am not trying to solve any problem. — user3078335, Sep 04 '19 at 03:54
@user2974951 you are right. I was just wondering if it was possible. — user3078335, Sep 04 '19 at 03:55

Sycorax · Accepted Answer · 2019-09-04T10:36:37.877

In comments, OP has explained that this proposal isn't oriented around solving any particular problem. While it might have some utility in a specific use-case, the general idea doesn't make a whole lot of sense.

Suppose your data is represented by a vector, and you want to know if there is a "slight change" (to use OP's words) between vector $x$ and vector $y$. This is easy enough: just measure a norm, such as the L2 norm. If the norm is positive, then there is a change! We can even be more specific about whether a change was "slight" or "large." Choose some $\epsilon > 0$;

If $\| x -y \|=0$, then there is no change.
If $0<\| x -y \|< \epsilon$, then the change is "slight."
If $\| x -y \| > \epsilon$, then the change is "large."

You can do similar comparisons for other types of data.

As a general observation, the purpose of ML is to generalize from a limited sample, so that in the future, new inputs that "look like" inputs used during training are mapped to outputs that are similar to training outputs. In this sense, hashing to detect slight changes and machine learning are opposite, because a robust machine learning method will not be overly sensitive to minute changes in the input (at least, it won't be sensitive to minute changes whenever those minute changes are not important to the output). For example, adversarial inputs are a problem in computer vision, because imperceptibly small changes to the input image can cause dramatic changes to the classification (see: Can't deep learning models now be said to be interpretable? Are nodes features?).

Locality-sensitive hashing (LSH) is a method to group semantically similar objects together in an unsupervised manner by constructing hash codes in a specific way, such as grouping near-duplicate text documents together. Sometimes, LSH uses machine learning methods to achieve this. However, a good LSH scheme should ignore "slight changes" to the input. In the near-duplicate text document example, we want some document and the same document with a handful of misspellings and comma splices to be grouped together. So this is actually the opposite of being sensitive to "slight changes."

Thank you very much! Very well explained. – user3078335 Sep 04 '19 at 21:01 — user3078335, Sep 04 '19 at 21:01

Machine Learning alternative for hashing

1 Answers1