0

Kindly help me with the difference between a distribution model and regression model as related to randomness

  • Have you searched our site yet for answers? See, for instance, https://stats.stackexchange.com/questions/233013/ for a general description of regression and https://stats.stackexchange.com/questions/148638 for some formal descriptions of various forms of regression models. That might help you refine and clarify your question. – whuber Mar 05 '24 at 16:22

1 Answers1

1

Actually, a regression model is a type of distribution model. Let me explain.

Suppose you have some binary data, $y_1, y_2, \cdots, y_n$. These could be coin flips, or deaths within some time frame, whatever. We can think of these as coming from some distribution. Since they are binary, we can think of each observation as coming from a Bernoulli distribution (alternatively, we could use a Binomial distribution)

$$ y_i \sim \operatorname{Bernoulli}(\theta) $$

Here, $\theta$ is the probability of seeing a success (e.g. a heads on a coin flip, or a death). So we have one distribution that describes all the data. What if along with each $y$ we also had some other information $x$? In the coin flip case, maybe $x$ is if there was a gust of wind at the moment the coin was flipped. In the death case, maybe it was the person's status as a smoker. We can then fit a regression of $y$ onto $x$. This would result in a logistic regression since the data are binary

$$ y_i \mid x\sim \operatorname{Bernoulli}(\theta(x)) $$

We still have a distribution that models the observation, but we have a slightly different distribution dependent on what the value of $x$ is. Maybe the chance of heads is lower when there is a gust of wind, maybe smokers are more likely to die in the study period, etc.

Regression defines a conditional distribution of the outcome. Conditional on what? Conditional on the values of $x$. Each strata defined by $x$ gets its own mean, and how that mean changes as a function of $x$ is what we get to specify as the modeller.