6

When we have two classes A, B, can we use one classifier which will be trained on class A samples and tested on both class A, class B samples?

When to use binary classifier and one-class classifier?

Can I use one-class classifier for any type of binary class data?

  • Example 1: Faulty and non-faulty machinery data. Can I implement one-class classifier?

  • Example 2: Positive and negative sentiment text data. Can I implement one-class classifier?

2 Answers2

10

Suppose you are trying to perform two-class classification on faulty and non-faulty machinery data, where each example in the dataset is represented using the feature vector $\mathbf{x} = [x_1 \ x_2]^T$. This could be done as shown below:

enter image description here

Here, the faulty machinery data is represented by the orange area, and the non-faulty machinery data represented by the blue area. Suppose that the machines in question are rarely ever faulty, such that there are so many different examples of non-faulty machinery, but very few examples of faulty machinery. Given that you have trained on the data shown above, it is possible that you observe an example of non-faulty machinery that you mis-classify as faulty:

enter image description here

Again, the reason that this happens is because the set of all possible feature vectors representing non-faulty machinery is just too big. It is not possible to capture all of them and train on them. You could argue that you just need to collect more data and train on that, but what you would end up doing is this:

enter image description here

This, by the way, can be done using a neural network, for example, and the act of drawing these lines is the basic idea behind discriminative modelling.

However, why bother collecting so much data and creating a very complex model that can draw all of these lines, when you can just try to draw the shape of the faulty data like this?

enter image description here

In the figure above, the orange area is the faulty machinery data that you collected, while the rest of the feature space is assumed to represent non-faulty machinery. This is the basic idea behind generative modelling, where, instead of trying to draw a line that splits the feature space, you instead try to estimate the distribution of the faulty data to learn what it looks like. Then, given a new test vector $\mathbf{x}$, all you need to do is to measure the distance between the center of the distribution of the faulty data and this new test vector $\mathbf{x}$. If this distance is greater than a specific threshold, then the test vector is classified as non-faulty. Otherwise, it is classified as faulty. This is also one way of performing anomaly detection.

mhdadk
  • 4,940
  • Thanks for your explanation! Consider the second example of text data. can we train a model only on positive sentiment sentences to classify samples of positive and negative sentiment. Since we have a large number of samples for both positive and negative sentiment. – SS Varshini Jan 18 '21 at 11:54
  • 1
    That depends. How many possible sentences are there for each class? If there is a lot of examples for one class compared to the other class, then consider it as an anomaly detection task – mhdadk Jan 18 '21 at 11:57
  • It means that if we have a balanced dataset, consider binary classifier else consider one class classifier? – SS Varshini Jan 18 '21 at 12:01
  • 2
    Not really. If one class is very specific, while another class is very general, then one-class classification is the way to go. For example, a faulty machine is a very specific example, but a non-faulty one is not. This is because the machine can be doing many different things while it is not faulty. Another example, in the case of audio, is if I want to detect if someone said a specific word or not. I will be scanning audio signals for a long time looking for the word, and I will have to classify all kinds of different audio as a negative... – mhdadk Jan 18 '21 at 17:02
  • 1
    Basically, if one class is much more diverse than the other class, one-class classification is a good idea. However, if you only want to classify cats and dogs, where both classes are almost equally diverse, then two-class classification is a better solution. – mhdadk Jan 18 '21 at 17:03
  • 1
    Ok I got it thanks for your explanation @mhdadk – SS Varshini Jan 18 '21 at 17:43
  • Sir I have a balanced binary class dataset with class A,B. I generated samples belongs to class A. Can I use one class classifier now – SS Varshini Feb 14 '21 at 03:56
  • I strongly disagree with you recommendation to use a single class classifier for two reasons. First, a single class classifier tells us nothing about probabilities. If one region has twice the density of failures, you might infer that a point in that region is more likely to fail than one in another area. But what if there are 100X more non-failures in that region? Your example only makes sense because the failure region does not overlap the non-failure region. – Ryan Volpi Apr 22 '22 at 14:09
  • Second, with a single class classifier, we do not learn if the sample is close to the failure region in any meaningful way. If there are many dimensions, positive and negative cases may be separable on some specific subset, but a single class classifier will have no idea which subset of features is important for discriminating classes because it only considers one class. – Ryan Volpi Apr 22 '22 at 14:10
  • For fault detection/quality control I'd have said that usually the positive (and mathematically well-defined) class is the non-faulty one: typically the product can be well specified, plus in practice many samples are available. In contrast, faults can be anything, including new, previously unkown ways of failure. – cbeleites unhappy with SX Apr 22 '22 at 15:22
  • @RyanVolpi: with a one-class classifier, we learn the probability of a region to hold cases from the positive (one) class. And all distance-based one-class classifiers inherently tell us about proximity to the does-not-belong threshold. But, we look at the absolute posterior probability of belonging, not relative to the absolute posterior probability of the examples from outside the class. The conclusion would then either be that the region belongs to the highly present class, or maybe that it something not seen before. – cbeleites unhappy with SX Apr 22 '22 at 15:29
  • @cbeleitesunhappywithSX I think your first comment is just describing the use of a one class classifier for some form of outlier detection, is it not? That seems different from what the above answer is recommending. – Ryan Volpi Apr 22 '22 at 16:07
  • @cbeleitesunhappywithSX Sorry, I don't quite understand your second comment. Would you mind explaining what probability we get from the one class classifier and how we can turn this probability into something useful for classification? – Ryan Volpi Apr 22 '22 at 16:08
  • @RyanVolpi: what I'm trying to say is that I'd set up the one-class classifier the opposite way from what this answer suggests, with the non-faulty class as the positive (one) class. – cbeleites unhappy with SX Apr 22 '22 at 16:08
  • @cbeleitesunhappywithSX If we assume failures are going to be outliers that seems fine (though it's the reverse of the example above). But that the negative class will be outside the distribution of positive classes is only an assumption. And I don't think that is a sound assumption in many classification cases. – Ryan Volpi Apr 22 '22 at 16:17
  • @RyanVolpi: I'll start with the no-overlap assumption. It is certainly possible that the set of measurements we have does not allow perfect distinction. It may be that our "duck" one-class classifier says "duck" for a rubber duck, too. However, we could set up a "rubber duck" classifier in addition, and then both the duck and the rubber duck classifiers would predict "belongs". There are situations where this behaviour is actually desired (e.g. general medical diagnosis as opposed to differential diagnosis which is a discriminative problem). However, if faulty and non-faulty overlap in ... – cbeleites unhappy with SX Apr 22 '22 at 16:29
6

You could, but why would you? There is much more to gain by training a model on both classes, so that your model can learn from both classes and better distinguish between them. Imagine trying to learn to distinguish between cats and dogs, would it help you to see only images of cats, and then try to guess which image is a dog, without ever having seen a dog?

Using a one-class classifier would make sense only if you do not have any examples from the other class at this particular point in time. Or if the "other" class could be composed of many unknown classes, which cannot be easily grouped into one official class. For ex. when doing anomaly detection, when you have only a few very weird cases, which could be the causes of many very varied reasons.

user2974951
  • 7,813
  • 2
    Even if they have access to both sets of examples, one-class classification could still be helpful. This occurs when the size of the set of all possible examples from one class greatly exceeds the size of the set of all possible examples from the other class. – mhdadk Jan 18 '21 at 09:28
  • 1
    Training one-class models for cats and dogs does help a lot when images of, say, foxes or birds come in, who should neither be classified as dog nor as cat. Also practically speaking, you deploy your dog recognition, then your cat recognition, and have other animals or objects follow as you achieve sufficient performance. – cbeleites unhappy with SX Apr 22 '22 at 15:31