11

Let's say an auction house holds $N$ exactly same copies of some valuable asset and holds an auction to sell them off. It knows that the attending public has some opinions about the value of this lot, which are distributed with a CDF $\Phi$.

The auction house also decides to go for the following procedure: in each case $1\leq i \leq N$ they set some price $x_i$ and ask a random person from the public whether they want to buy it. If this offer is rejected, they again pick a person at random and they do this untill they find a buyer. Hence the probability of a deal happening within $t$ rounds is $1 - \Phi^t(x)$.

My question is the following: suppose that I don't know $\Phi$ and instead I have only observations $(x_i, t_i)_{i=1}^N$ where $t_i$ is the number of rounds it took to find a buyer at a price $x_i$. How can I best estimate $\Phi$ from this data? What if I only was interested in estimating the median of $\Phi$?

I know probability, but statistical methods I am not very familiar with, hence I'm not sure which tag to put. Please feel free to retag.

SBF
  • 425
  • Is the asking price $x$ being set at random from a particular distribution, or is the auction house able to choose an asking price on each round? – Eoin Apr 24 '23 at 08:52
  • @Eoin I guess the former – SBF Apr 24 '23 at 08:59

3 Answers3

6

To examine the estimator for $\Phi$ we will treat $\mathbf{x}$ as a control variable, meaning that the auction house sets their own prices for each round of auctions. Without loss of generality, let us suppose that the values in this control vector are in non-decreasing order and let $\Phi_i \equiv \Phi(x_i)$ denote the corresponding unknown CDF values at the chosen points, so that we have a set of ordered unknown points $0 < \Phi_1 \leqslant \cdots \leqslant \Phi_n < 1$.$^\dagger$ This gives us the likelihood function for the data is:

$$\begin{align} L_{\mathbf{x},\mathbf{t}}(\Phi) &= \prod_{i=1}^n \text{Geom}(t_i| 1-\Phi(x_i)) \\[6pt] &= \prod_{i=1}^n \Phi(x_i)^{t_i-1}(1-\Phi(x_i)) \\[6pt] &= \prod_{i=1}^n \Phi_i^{t_i-1} (1-\Phi_i) \\[6pt] \end{align}$$

As you can see, the likelihood depends on the distribution $\Phi$ only through the specific points used for the control variable, so without any additional structural assumption on the distribution, you can only learn about the distribution through inference about the CDF at these points.

It is possible to find the MLE by solving the multivariate optimisation problem that maximises the above function over the constrained space $0 < \Phi_1 \leqslant \cdots \leqslant \Phi_n < 1$. This is a constrained multivariate optimisation problem, which will typically require some transformation and numerical methods. Below I show how you can construct an algorithm to compute the MLE by writing the CDF points as a transformation of an unconstrained parameter vector, thereby converting the problem to an unconstrained optimisation.

Solving to obtain the MLE gives us an estimator of the form $\hat{\boldsymbol{\Phi}} = (\hat{\Phi}_1,...,\hat{\Phi}_n)$, and with some further work we can obtain the estimated asymptotic variance matrix for this estimator (and therefore obtain the standard errors of each of the elements). To estimate the CDF at points other than those used as control variables you could use interpolation, noting that this entails some implicit assumptions about the structure of the distribution (e.g., if you were to estimate the median using linear interpolation from the MLE then this would involve an implicit assumption that the CDF is linear between the relevant control points used in the interpolation). It is also possible to use alternative modelling methods where you assume a particular parametric distributional form for $\Phi$ and then estimate the parameters of the distribution using the MLE or another estimator.


Computing the MLE: As noted above, the MLE involves a constrained multivariate optimisation problem, and this can be solved by conversion to an unconstrained mutivariate optimisation problem. To facilitate this analysis it is useful to write the parameters $\Phi_1,...,\Phi_n$ as transformations of an unconstrained parameter vector $\boldsymbol{\gamma} = (\gamma_1, ..., \gamma_n) \in \mathbb{R}^n$, given by:$^\dagger$

$$\Phi_i = \frac{\sum_{r=1}^i \exp(\gamma_r)}{1+\sum_{r=1}^n \exp(\gamma_r)}.$$

This transformation allows us to write the likelihood function in terms of the parameter $\boldsymbol{\gamma}$ and obtain the MLE through this parameter. If we let $\bar{t}_n \equiv \sum_{i=1}^n t_i$ then we can write the likelihood function as:

$$\begin{align} L_{\mathbf{x},\mathbf{t}}(\boldsymbol{\gamma}) &= \prod_{i=1}^n \bigg( \frac{\sum_{r=1}^i \exp(\gamma_r)}{1+\sum_{r=1}^n \exp(\gamma_r)} \bigg)^{t_i-1} \bigg( 1 - \frac{\sum_{r=1}^i \exp(\gamma_r)}{1+\sum_{r=1}^n \exp(\gamma_r)} \bigg) \\[6pt] &= \prod_{i=1}^n \bigg( \frac{\sum_{r=1}^i \exp(\gamma_r)}{1+\sum_{r=1}^n \exp(\gamma_r)} \bigg)^{t_i-1} \bigg( \frac{1+\sum_{r=i+1}^n \exp(\gamma_r)}{1+\sum_{r=1}^n \exp(\gamma_r)} \bigg) \\[6pt] &= \frac{\prod_{i=1}^n (\sum_{r=1}^i \exp(\gamma_r))^{t_i-1} (1+\sum_{r=i+1}^n \exp(\gamma_r))}{(1+\sum_{r=1}^n \exp(\gamma_r))^{n \bar{t}_n}} \\[6pt] \end{align}$$

and the corresponding log-likelihood is:

$$\begin{align} \ell_{\mathbf{x},\mathbf{t}}(\boldsymbol{\gamma}) &= \sum_{i=1}^n (t_i-1) \log \bigg( \sum_{r=1}^i \exp(\gamma_r) \bigg) + \sum_{i=1}^n \log \bigg( 1 + \sum_{r=i+1}^n \exp(\gamma_r) \bigg) \\[6pt] &\quad \quad \quad - n(1+\bar{t}_n) \log \bigg( 1+\sum_{r=1}^n \exp(\gamma_r) \bigg) \\[6pt] &= \sum_{i=1}^n (t_i-1) \text{LSE}(\gamma_1,...,\gamma_i) + \sum_{i=1}^n \text{LSE}(0,\gamma_{i+1},...,\gamma_n) \\[6pt] &\quad \quad \quad - n \bar{t}_n \text{LSE}(0, \gamma_1,...,\gamma_n). \\[6pt] \end{align}$$

(The function $\text{LSE}$ here is the logsumexp function.) The partial derivatives of the log-likelihood (which are the elements of the score function) are:

$$\begin{align} \frac{\partial \ell_{\mathbf{x},\mathbf{t}}}{\partial \gamma_k}(\boldsymbol{\gamma}) &= \sum_{i \geqslant k} \frac{(t_i-1) \exp(\gamma_k)}{\sum_{r=1}^i \exp(\gamma_r)} + \sum_{i < k} \frac{\exp(\gamma_k)}{1+\sum_{r=i+1}^n \exp(\gamma_r)} - \frac{n \bar{t}_n \exp(\gamma_k)}{1+\sum_{r=1}^n \exp(\gamma_r)} \\[12pt] &= \sum_{i \geqslant k} (t_i-1) \exp(\gamma_k - \text{LSE}(\gamma_1,...,\gamma_i)) + \sum_{i < k} \exp(\gamma_k - \text{LSE}(0,\gamma_{i+1},...,\gamma_n)) \\[6pt] &\quad \quad \quad - n \bar{t}_n \exp(\gamma_k - \text{LSE}(0, \gamma_1,...,\gamma_n)) \\[12pt] &= \exp( \text{LSE}(\log(t_k-1) + \gamma_k - \text{LSE}(\gamma_1, ..., \gamma_k) , ..., \log(t_n-1) + \gamma_k - \text{LSE}(\gamma_1,...,\gamma_n)) ) \\[6pt] &\quad \quad + \exp( \text{LSE}(\gamma_k - \text{LSE}(0,\gamma_{2},...,\gamma_n), ..., \gamma_k - \text{LSE}(0,\gamma_{k},...,\gamma_n)) ) \\[6pt] &\quad \quad - n \bar{t}_n \exp(\gamma_k - \text{LSE}(0, \gamma_1,...,\gamma_n)) \\[6pt] \end{align}$$

Setting all partial derivatives to zero and solving for $\boldsymbol{\gamma}$ gives a critical point of the log-likelhood function, which will give the MLE $\hat{\boldsymbol{\gamma}}$. From the invariance property of the MLE we can then easily obtain the MLE $\hat{\boldsymbol{\Phi}}$. In the code below we create a function MLE.auction which computes the MLE for any input vectors x and t (with some other arguments to control the optimisation). The function produces a list giving the MLE and some associated outputs relating to the optimisation.

MLE.auction <- function(x, t, gradtol = 1e-7, steptol = 1e-7, iterlim = 1000) {

#Check inputs x and t if (!is.vector(x)) stop('Error: Input x should be a numeric vector') if (!is.numeric(x)) stop('Error: Input x should be a numeric vector') if (!is.vector(t)) stop('Error: Input t should be a numeric vector') if (!is.numeric(t)) stop('Error: Input t should be a numeric vector') if (any(t != as.integer(t))) stop('Error: Input t should contain only integers') if (min(t) < 1) stop('Error: Input t should contain only positive integers') if (length(x) != length(t)) stop('Error: Inputs x and t should have the same length')

#Check input gradtol if (!is.vector(gradtol)) stop('Error: Input gradtol should be numeric') if (!is.numeric(gradtol)) stop('Error: Input gradtol should be numeric') if (length(gradtol) != 1) stop('Error: Input gradtol should be a single numeric value') if (min(gradtol) <= 0) stop('Error: Input gradtol should be positive')

#Check input steptol if (!is.vector(steptol)) stop('Error: Input steptol should be numeric') if (!is.numeric(steptol)) stop('Error: Input steptol should be numeric') if (length(steptol) != 1) stop('Error: Input steptol should be a single numeric value') if (min(steptol) <= 0) stop('Error: Input steptol should be positive')

#Check input iterlim if (!is.vector(iterlim)) stop('Error: Input iterlim should be numeric') if (!is.numeric(iterlim)) stop('Error: Input iterlim should be numeric') if (length(iterlim) != 1) stop('Error: Input iterlim should be a single numeric value') if (iterlim != as.integer(iterlim)) stop('Error: Input iterlim should be an integer') if (min(iterlim) <= 0) stop('Error: Input iterlim should be positive')

#Set preliminary quantities ORD <- order(x) x <- x[ORD] t <- t[ORD] n <- length(t) tbar <- mean(t)

#Set negative log-likelihood function NEGLOGLIKE <- function(gamma) { TT <- matrixStats::logSumExp(c(0, gamma[1:n])) T1 <- rep(0, n) T2 <- rep(0, n) for (i in 1:n) { T1[i] <- matrixStats::logSumExp(gamma[1:i]) if (i < n) { T2[i] <- matrixStats::logSumExp(c(0, gamma[(i+1):n])) } } NLL <- - sum((t-1)T1) - sum(T2) + ntbar*TT

#Set derivative
SS &lt;- n*tbar*exp(gamma - TT)
S1 &lt;- rep(0, n)
S2 &lt;- rep(0, n)
for (k in 1:n) {
  S1[k] &lt;- exp(matrixStats::logSumExp(log(t[k:n]-1) + gamma[k] - T1[k:n]))
  if (k &gt; 1) { S2[k] &lt;- exp(matrixStats::logSumExp(gamma[k] - T2[1:(k-1)])) } }
DERIV &lt;- - S1 - S2 + SS
attr(NLL, 'gradient') &lt;- DERIV 

#Give output
NLL }

#Compute optima OPT <- nlm(NEGLOGLIKE, p = rep(0, n), gradtol = gradtol, steptol = steptol, iterlim = iterlim)

#Convert to MLE for phi GAMMA.MLE <- OPT$estimate MAXLOGLIKE <- -OPT$minimum TTT <- matrixStats::logSumExp(c(0, GAMMA.MLE)) TT1 <- rep(0, n) for (i in 1:n) { TT1[i] <- matrixStats::logSumExp(GAMMA.MLE[1:i]) } PHI.MLE <- exp(TT1 - TTT) MLE.OUT <- data.frame(x = x, t = t, MLE = PHI.MLE) rownames(MLE.OUT) <- sprintf('Phi[%s]', 1:n)

#Give output list(MLE = MLE.OUT, maxloglike = MAXLOGLIKE, mean.maxloglike = MAXLOGLIKE/n, code = OPT$code, iterations = OPT$iterations, gradtol = gradtol, steptol = steptol, iterlim = iterlim) }

We can test this function on the set of input data $\mathbf{x} = (4, 6, 12, 15, 21)$ and $\mathbf{t} = (2, 7, 6, 12, 15)$. As you can see from the output, the computed MLE respects the CDF ordering for the input values and it also tells you the maximised value of the log-likelihood function.

#Set data
x <- c(4, 6, 12, 15, 21)
t <- c(2, 7, 6, 12, 15)

#Compute MLE MLE.auction(x = x, t = t)

$MLE x t MLE Phi[1] 4 2 0.5000001 Phi[2] 6 7 0.8461538 Phi[3] 12 6 0.8461538 Phi[4] 15 12 0.9166667 Phi[5] 21 15 0.9333334

$maxloglike [1] -14.08348

$mean.maxloglike [1] -2.816695

...


$^\dagger$ Here we use the simplifying assumption that $\Phi_1 > 0$ and $\Phi_n < 1$. It is simple to proceed without this simplifying assumption by allowing $\boldsymbol{\gamma} \in \bar{\mathbb{R}}^n$, which is the extended Euclidean space.

Ben
  • 124,856
  • Thanks, interesting. So that was a non parametric MLE, correct? For a parametric MLE things should be simpler? – SBF Apr 25 '23 at 04:48
  • 1
    Yes, but whether a parametric MLE is simpler would depend on the parametric form. – Ben Apr 25 '23 at 05:00
  • 1
    Thanks. I've accepted your answer and awarded the bounty. I want to ask about iterative implementation of the MLE search, should I do this in a new answer or a comment here suffices? – SBF Apr 26 '23 at 06:45
  • If it is a simple and specific question closely related to this answer then ask it here, but if it is a general question, best to ask in a new question (and link this question if needed). – Ben Apr 26 '23 at 06:47
  • Sure, I think this question may be worth a separate answer, so I've asked it here https://stats.stackexchange.com/questions/614168/iterative-mle-possibly-changing-distribution Please give me your feedback on it whenever you have some time – SBF Apr 26 '23 at 07:39
5

As you pointed out, each observation $(x_i,t_i)$ gives you information on the value of the cumulative distribution at the point $x_i$, however notice that the probability of this observation is $\Phi(x_i)^{t_i-1}(1-\Phi(x_i))$ (the first $t-1$ offers rejected and the last one accepted).

So, without additional assumption on $\Phi$ (such as smoothness), you can only directly estimate from the data the set of values $\{\Phi(x_i)\}$. Now there are many possible methods of statistical inference, but most are based on the likelihood function, so that's a good place to start. Denote $p_i = \Phi(x_i)$ the set of unknown parameters and assume that the $x_i$'s are sorted such that $0 \le p_1 \le p_2 \le ...\le p_n \le 1$. The likelihood function is then

$$\log \mathcal L(p_1,...,p_n) = \sum_{i=1}^n (t_i-1)\log p_i + \log(1-p_i).$$

The constraint on the $p_i$'s being monotonically increasing makes this a slightly less trivial problem, but we can still find the maximum likelihood estimators (MLE) with a bit of work. The unconstrained MLE's are easily found by setting the respective derivatives of the likelihood to zero:

$$ \frac{\partial \log \mathcal L}{\partial p_i} = \frac{t_i - 1}{p_i} - \frac{1}{1-p_i}=0 $$ which gives

$$ \hat p_i = \frac{t_i-1}{t_i}.$$

If this set of MLE's satisfy the constraint $0 \le \hat p_1 \le \hat p_2 \le ... \le \hat p_n \le 1$ then we are done. Otherwise, since there is only a single local maximum at the interior of the parameter space, the global maximum must be on the boundary, namely there must be some $i$ such that $\hat p_i = \hat p_{i+1}$. One can test all possible $n$ pairs, but it is quite clear the highest value of the likelihood will be achieved by applying this to a pair that is in the "wrong" order. Applying this additional constraint only affects the terms in the likelihood involving $p_i$ and $p_{i+1}$ so the equation for the MLE becomes:

$$ \frac{\partial \log \mathcal L}{\partial p_i} = \frac{t_i + t_{i+1}-2}{p_i} - \frac{2}{1-p_i}=0 $$

Which is, understandably, the same equation we got before just with the average $(t_i+t_{i+1})/2$ replacing $t_i$.

Repeating this procedure for all out of order pairs will give us the constrained MLE's. If there are three or more consecutive estimators that are out of order, the number of possibilities we need to test becomes larger. In general there are $2^n$ ways of choosing which consecutive pair of estimators are equal, so this might become non-feasible if all the estimators are completely out of order, but in that case we probably can't say much about $\Phi(x)$ anyway.

Finding the MLE's is just the tip of the iceberg. You may also want to estimate the uncertainty of the estimators, add assumptions on the shape on the distribution $\Phi$ and so on. Describing all methods of doing it will require a full course in statistics. The particular methods that are best suited to your case will depend on the nature of your assumptions and data, and what purpose you want to use it for.

UPDATE

As a simple example we can apply this to the same input data as given in @Ben's answer: $t=(2,7,6,12,15)$. There is one pair in the wrong order $(7,6)$ so we conclude that $p_2=p_3$, and we proceed by replacing those values with the average $6.5$ and simply calculating $(t-1)/t$ for the set $(2,6.5,6.5,12,15)$: this results in $\hat p = (0.5000, 0.8462, 0.8462, 0.9167, 0.9333)$, In complete agreement with @Ben's numerical calculation.

J. Delaney
  • 5,380
  • Thanks a lot! Your answer was very informative, just the other one included a procedure I've found a bit better more elaborate. Wish I could split the bounty between the two – SBF Apr 26 '23 at 06:44
  • Nice answer (+1). Since this gives an analytic solution, it would be great to confirm that parameters are also equal in the case where there are more than two out of order. This appears to be the case from your working, but would be good to give more detail. – Ben Apr 26 '23 at 06:48
0

In general, problems like this can be solved by writing out the likelihood function: the likelihood of seeing a particular set of observations as a function of the parameters of the CDF, which I presume is Gaussian with parameters $\mu$ and $\sigma$.

In R, this would look as follows

likelihood_function = function(parameters, asking_price, number_asked){
  mu = parameters[1]
  sigma = parameters[2]
  # Probability of any one person accepting the asking price
  p_accept = pnorm(asking_price, mu, sigma, lower.tail = T)
  # Number of people asked BEFORE someone accepts 
  # follows geometric distribution (pdf = p_accept * (1 - p_accept)^number),
  number_of_rejections = number_asked - 1
  loglikelihood = dgeom(number_of_rejections, p_accept, log = T)
  loglikelihood_adj = loglikelihood + 1e-6 # Add tiny correction to prevent underflow
  sum(loglikelihood_adj)
}

You can then use optimisation to find the parameters that maximise this function: the maximum-likelihood parameters.

starting_values = c(mu = 50, sigma = 100)
result = optim(starting_values, likelihood_function, 
               asking_price = data$asking_price,
               number_asked = data$number_asked,
               control = list(fnscale = -1)) # Find max, not min
result$par

You can find plenty of information on this site on how to calculate the uncertainty around these estimates, e.g. using Fisher information, or Bayesian methods.

While we're here, the code below simulates data from the model you describe.

library(tidyverse)
true_pars = c(mu = 10, sigma = 10)

simulate_trials = function(n_trials, mu, sigma){

Distribution of asking prices is uniform [0, 100]

asking_price = round(runif(n_trials, 0, 100))

Probability of any one person accepting the asking price

p_accept = pnorm(asking_price, mu, sigma, lower.tail = T)

Number of people asked BEFORE someone accepts

follows geometric distribution (pdf = p_accept * (1 - p_accept)^number),

number_of_rejections = rgeom(n_trials, prob = p_accept)

so total number asked is that +1

number_asked = number_of_rejections + 1 data.frame(asking_price, p_accept, number_asked) %>% arrange(asking_price) }

data = simulate_trials(1000, true_pars[1], true_pars[2])

ggplot(data, aes(asking_price, p_accept)) + geom_path()

ggplot(data, aes(asking_price, number_asked)) + stat_smooth() + scale_y_log10()

enter image description here

Eoin
  • 8,997
  • Something off here. The number asked should go up as the asking price increases – Lukas Lohse Apr 24 '23 at 18:16
  • Thanks for the answer. That would be different from doing a Bayesian inference on a prior estimate of the distribution? – SBF Apr 24 '23 at 18:23
  • @LukasLohse, whoops, you're right, I've done willingness to sell rather than willingness to buy. Changing lower.tail = T to F would fix this, but since I don't have time to fix the plot I won't yet. – Eoin Apr 25 '23 at 08:02