I am reading Dupuy & Galichon (2014), which extend the estimation of matching model in Choo & Siow (2006) to continuous types. The way they build the continuous logit model is based on the insights of Cosslett (1988) and Dagsvik (1994) which "have independently suggested using max-stable processes to model continuous choice."
The detail of the continuous logit model model is described in the Appendix A:
"In this paragraph, we expound the main ideas of Cosslett (1988) and Dagsvik (1994), who show how to obtain a continuous version of the multinomial logit model. Assume that $\lbrace\left(y_k^m, \varepsilon_k^m\right), k \in \mathbb{N} \rbrace$ are the points of a Poisson point process on $\mathcal{Y} \times \mathbb{R}$ of intensity $d y \times e^{-\varepsilon} d \varepsilon$. We recall that this implies that for $S$ a subset of $\mathcal{Y} \times \mathbb{R}$, the probability that man $m$ has no acquaintance in set $S$ is $\exp \left(-\int_S e^{-\varepsilon} d y d \varepsilon\right)$. From (2), man $m$ chooses woman $k$ among his acquaintances such that his utility is maximized; that is, man $m$ solves $$\max_k \lbrace U\left(x,y_k^m\right)+\varepsilon_k^m \rbrace$$.
Letting $Z$ be the value of this maximum, one has for any $c \in \mathbb{R}$ $$ \operatorname{Pr}(Z \leq c)=\operatorname{Pr}\left(U\left(x, y_k^m\right)+\varepsilon_k^m \leq c \forall k\right), $$ which is exactly the probability that the Poisson point process $\left(y_k, \varepsilon_k^m\right)$ has no point in $\{(y, \varepsilon): U(x, y)+\varepsilon>c\}$; thus $$ \begin{aligned} \log \operatorname{Pr}(Z \leq c) & =-\iint_{\mathcal{Y} \times \mathbb{R}} 1(U(x, y)+\varepsilon>c) d y e^{-\varepsilon} d \varepsilon \\ & =-\int_{\mathcal{Y}} \int_{c-U(x, y)} e^{-\varepsilon} d \varepsilon d y \\ & =-\int_{\mathcal{Y}} e^{-c+U(x, y)} d y \\ & =-\exp \left[-c+\log \int_{\mathcal{Y}} \exp U(x, y) d y\right] \end{aligned} $$
Hence $Z$ is a $\left(\log \int_{\mathcal{Y}} \exp U(x, y) d y, 1\right)$ Gumbel. In particular, $\mathbb{E}\left[\max_k \lbrace U\left(x, y_k^m\right)+\varepsilon_k^m \rbrace \right]=\log \int_{\mathcal{y}} \exp U(x, y) d y,$
and the choice probabilities are given by their density with respect to the Lebesgue measure
$$\pi(y \mid x)=\exp [U(x, y)] /\left[\int_{\mathcal{Y}} \exp U\left(x, y^{\prime}\right) d y^{\prime}\right] .$$
The same logic also implies that $\lbrace \varepsilon_k: k \in \mathbb{N} \rbrace$ has a Gumbel distribution. Indeed, the probability that this Poisson point process has no element in the set $\{\varepsilon: \varepsilon>c\}$ is equal to $$ \exp \left(-\int_c^{+\infty} e^{-\varepsilon} d \varepsilon\right)=\exp [-\exp (-c)] $$ which is equivalent to saying that $\operatorname{Pr}\left(\max _{k \in \mathbb{N}} \varepsilon_k \leq c\right)=\exp [-\exp (-c)]$. Finally, note that a similar argument would show that $m$ has almost surely an infinite, though countable, number of acquaintances, as announced. "
///
I think I fully understand the derivation towards "$Z$ is a Gumbel". But then I stuck on deriving the perhaps most important equation of logit model: $$ \pi(y \mid x)=\exp [U(x, y)] /\left[\int_{\mathcal{Y}} \exp U\left(x, y^{\prime}\right) d y^{\prime}\right]$$ . I don't see how it comes from the previous derivation.
///
I even checked one of the paper cited, Dagsvik 1994, and found in its appendix (PROOF OF THEOREM4) there is a similar derivation (A.8 to A.9) but again without any further explanation. In case anyone is interested, the equations there are "(A.8) $$ \begin{gathered} P\left(\sup _{T(z) \in A,(T(z), E(z)) \in H, z \in Z}(\hat{v}(\hat{p}(T(z)), T(z), K)+E(z)) \leqslant y\right) \\ = \begin{cases}\exp \left\{-e^{-y} \mu \int_A \exp (\hat{v}(\hat{p}(t), t, K)) G(d t)\right\} & \text { for } y \geqslant c, \\ 0 & \text { for } y<c .\end{cases} \end{gathered} $$
From (A.8) we get (A.9) $$ \begin{gathered} P\left(\sup _{T(z) \in A,(T(z), E(z)) \in H, z \in Z}(\hat{v}(\hat{p}(T(z)), T(z), K)+E(z))\right. \\ \left.>\sup _{T(z) \in D-A,(T(z), E(z)) \in H, z \in Z}(\hat{v}(\hat{p}(T(z)), T(z), K)+E(z))\right) \\ \quad=\frac{\int_{u \leqslant t, u \in D} \exp (\hat{v}(\hat{p}(u), u, K)) G(d u)}{\int_D \exp (\hat{v}(\hat{p}(u), u, K)) G(d u)} \cdot\left(1-\exp \left(-\tilde{\Lambda}_c\right)\right), \end{gathered} $$ where $$ \tilde{\Lambda}_c \equiv \mu e^{-c} \int_D \exp (\hat{v}(\hat{p}(t), t, K)) G(d t) . $$
Since $\Lambda_c$ is the expected number of Poisson points in $H \cap(D \times R)$ the probability that $H \cap(D \times R)$ is nonempty equals $$ 1-\exp \left(-\tilde{\Lambda}_c\right) \text {. } $$"