Can anyone give the intuition behind of the relationship between these two? I see a lot of proofs in books, but no real intuition.
Thanks
Can anyone give the intuition behind of the relationship between these two? I see a lot of proofs in books, but no real intuition.
Thanks
Suppose a customer is facing a decision between buying ($y=1$) and not buying ($y=0$). The utilities of the two options are $$ \begin{array} UU_{1} = \beta_1 \cdot x + \varepsilon_1 \\ U_0 = \beta_0 \cdot x + \varepsilon_0, \end{array} $$ where $x$ is some characteristic of the customer, like age, gender, or income. The $\varepsilon$s are all the things the econometrician doesn't get to observe, like a good looking sales clerk who flirts with our agent ($\varepsilon_1>0$) or a nasty smell in the store ($\varepsilon_1<0$) or the beach is sunny today ($\varepsilon_0>0$), making the outside option look sweeter. These are the random components of utility, which we need to explain why seemingly identical people make different decisions. The econometrician also doesn't get to see the utilities, only the binary outcomes $y$s that come from utility maximization and characteristics of the consumer $x$s.
The agent makes a purchase $(y=1)$ when $U_1>U_0$, which can be expressed as $$ (\beta_1-\beta_0) \cdot x + (\varepsilon_1 - \varepsilon_0)>0, $$ or, equivalently, as $$ d_{\varepsilon} = (\varepsilon_0-\varepsilon_1) < (\beta_1-\beta_0) \cdot x=b \cdot x $$
If the $\varepsilon$s are identically and independently distributed as type 1 extreme value, their difference $d_{\varepsilon}$ is distributed as logistic. Now we can ask the question: What is a probability that a logistic variable is less than some number $b \cdot x$ and the customer buys? To answer that we use the logistic CDF, and get the logistic model:
$$ \mathbf{Pr}(U_1>U_0 \vert x)=\mathbf{Pr}(y=1 \vert x) = \mathbf{Pr} (d_{\varepsilon}<b \cdot x)=F(bx)=\frac{\exp(b \cdot x)}{1+\exp(b \cdot x)}. $$
Of course, the econometrician doesn't know $b$. But if there's data on more than one person, $b$s can be estimated (at least up to scale).
Different assumptions about the $\varepsilon$s' distributions give rise to different binary choice models. For instance, assuming normality leads to the probit model.