0

My question come across from this post: How to know that your machine learning problem is hopeless?

Is there any mathmatical or statistical way to prove that my machine learning problem is hopeless?

I have a project which tries to predict binary classes (0 and 1). However, I try the several models (some DL and classical ML) but model always give 50% of val_accuracy, which the models do not learn well. I have tried fine-tuning, feature engineering, dimensional reduction, and normalization/ stadardization, but it did not work.

How can I use mathmatical or statistical way to prove my ML problem is hopeless or is just not preditable?

1 Answers1

0

The closest mathematical problem to your question is to prove that two sets of random variables are statistically independent (in the context of a machine learning classifier those sets would be (i) the outcome, call it y, and (ii) the features, call them X). I guess you cannot solve the problem in all generality, but you can if you know/assume something regarding the distribution of the variables, and you can do that numerically to a certain level of confidence.

You basically have to prove that:

P(X,y)=P(X)*P(y)

i.e., that the joint probability distribution of the two random variables can be factored into the product of their individual probability distributions.

This for an academic answer. In reality, one of the advantage of a machine learning approach is that you do not need to know the probability distribution of your variables, and typically you do not know it/you are not very interested in it - just if you can find a predictive relationship between the two sets, that generalizes to unseen data (of the same type). So there is no way to prove it definitely, that there won't be an algorithm you have not tried yet that would be able to perform classification above chance (or find any other predictive relationship between the features and the outcome). You can just try new algorithms until you are tired/hopeless.

fabiob
  • 702