2

I have two groups of vectors A = [[0 0 0 0 1], [0 0 0 1 0], …] and B = [[0 0 1 0 1], [0 1 0 0 1], …]. I want to find the pattern that differentiates the vectors in A from the vectors in B. How to do that? Example output:

  • if both the 2nd and 3rd term and/or the 4th and 5th term are both equal to 0, then the vector belongs to group A
  • the first term doesn’t tell anything about whether the vector belongs to group A or B
  • ecc..
  • A simple approach would be a logistic regression, possibly using interactions. Alternatively, tree-based classifiers. How many vectors do you have, how many per group, and are they all of length 5? – Stephan Kolassa Jan 11 '24 at 12:29
  • In this specific case I have all vectors of length 5 and 13 vectors in A and 18 vectors in B. – user37959 Jan 11 '24 at 12:31
  • 2
    31 data points is extremely little data. Try a classification tree, but be aware that it will likely not perform well on new data. – Stephan Kolassa Jan 11 '24 at 12:33
  • That’s why I said in this specific case, the vectors in the 2 groups will always have the same length (which may be more than 5) and the number of vectors in A and in B may be also higher. I’m just using this small set because it is easier to start with, I can see the pattern myself with a smaller amount of data. What I want to find is a set of “rules” that differentiates the vectors in A from the vectors in B. A and B change each time. – user37959 Jan 11 '24 at 12:52
  • Each vector represents a state. The vectors in A are the states in which the operation is executed successfully, and the vectors in B are the states in which the operation fails. (I don’t know if this part was explained well enough) I want to find the rules that differentiate A from B in order to be able to explain why the operation fails. – user37959 Jan 11 '24 at 12:53
  • 2
    Your question itself is phrased in exactly the same way a classification tree would present its results. Assuming these are binary vectors, the dataset is far too small to apply logistic regression: to have much hope of success with that, you would need a very well-balanced dataset of many hundreds of vectors in each group at the bare minimum. – whuber Jan 11 '24 at 13:49

0 Answers0