1

I am trying to understand theory from my Model Identification And Data Analysis course at University.

The example I am referring to is the probability of predicting a heart attack. Essentially, from my dataset Dn I take information such as age, cholesterol level, activity level etc etc and I feed them to my ideal function, which will return a conditional probability P(y=+1|x) where y=+1 means heart attack is likely to occur.

What is not exactly clear to me is why at this stage we then make a random prediction on this probability and we take the output y_red of this prediction as a result. My questions are:

  1. Wouldn’t it be sufficient to simply accept the conditional probability result as the output of our model?
  2. What does it exactly mean to generate a random prediction form a probability? I have looked up inverse transform sampling but it’s not very clear to me

Thanks

enter image description here

2 Answers2

11
  • You don't need to generate the random predictions. You may want to do this if you would like to simulate the distribution of possible outcomes, e.g. for different $X$'s draw at random multiple predictions.
  • Predicting the probability $p(y|X)$ and taking this as a prediction is perfectly fine. See, for example, the Why isn't Logistic Regression called Logistic Classification? thread. Many people would argue that this is the most reasonable thing to do.
  • Alternatively, if you need a binary classification, you can use a threshold such that if the probability is larger than some value, you predict one outcome, otherwise another.
Tim
  • 138,066
  • Thanks for your kind answer. Any other reason this might be done? – Mattia Iezzi Sep 25 '23 at 10:15
  • @MattiaIezzi many other reasons, e.g. you are predicting the success of a marketing campaign and want to use the prediction to randomize assignment to the campaign based on the probability of success. – Tim Sep 26 '23 at 21:39
9

Yes, the output probability should be sufficient. If you predict a 6 or 20 or 90 percent chance of a heart attack, that is the prediction. You can evaluate the predictions to determine how much you should care what your model predicts (a model might give terrible or unreliable predictions), but that predicted probability is the model output.

Generating a random sample from a probability means, essentially, that you flip a coin with the given probability as the probability of flipping heads, and then you check if you flip heads or not. Inverse transform sampling would be the underlying mathematics of doing this (basically how a software function like rbinom or np.random.binomial works), but that is the gist of what is happening. This is a way to convert your predicted probability into a categorical prediction. Statistics usually advocates for direct evaluation of the predicted probability values, rather than turning those predicted probability values into categories.

Thus, to answer the title question of why you have to turn the predicted probability values into discrete categories (e.g., heart attack vs not, instead of probability of a heart attack), I would say that you definitely don’t have to and probably shouldn’t do it even though you can, and I recommend pressing your instructor on this. If this answer is that you need some way to evaluate the accuracy of your model, the link I gave gets into why this is. It the case, and multiple other links on my profile discuss this in various other ways. This issue often, but not always, arises in the context of class imbalance.

Dave
  • 62,186
  • I think I explained myself poorly here. Y_red is also a probability itself – Mattia Iezzi Sep 25 '23 at 10:06
  • 1
    @MattiaIezzi I don’t think it is. Why do you think it is? – Dave Sep 25 '23 at 10:07
  • But that’s what our professor told us, I’m very confused – Mattia Iezzi Sep 25 '23 at 10:08
  • Are you, perhaps, converting from $\pm 1$ to $0$ vs $1?$ For instance, before you get to Y_red, can you have a prediction of $-0.5?$ – Dave Sep 25 '23 at 10:10
  • Yes, absolutely. I take P(y=+1|x) in a range between 0 and 1 and then I make a random extraction based on a value between 0 and 1. Question is, what does this mean, and why is it done? – Mattia Iezzi Sep 25 '23 at 10:14
  • Are you in a Bayesian framework where you have some posterior density? – Dave Sep 25 '23 at 10:15
  • No particular info was given. The professor told us it could have also been a classical normal or Gaussian distribution – Mattia Iezzi Sep 25 '23 at 10:16
  • 2
    A Gaussian distribution is not a probability. Might you be doing a probit regression and be applying the inverse link function? // I really have no idea what’s going on. I think you need clarification from your instructor. As it stands, Tim and I have given good (agreeing) answers to a question it seems you did not ask. – Dave Sep 25 '23 at 10:40