10

is there a way to fit a specified distribution if you are only given a few quantiles?

For example, if I told you I have a gamma distributed data set, and the empirical 20%, 30%, 50% and 90%-quantiles are, respectively:

      20%       30%       50%       90% 
0.3936833 0.4890963 0.6751703 1.3404074 

How would I go and estimate the parameters? Are there multiple ways to do that, or is there already a specific procedure?

more edit: I don't specifically ask for the gamma distribution, this was just an example because I worry I can't explain my question appropriately. My task is that I have some (2-4) given quantiles, and want to estimate the (1-3) parameters of a few distributions as "close" as possible. Sometimes there's an (or infinite) exact solution(s), sometimes not, right?

  • 1
    I voted to close this as a duplicate of http://stats.stackexchange.com/questions/6022, but then it occurred to me that there are possible interpretations of this question that make it different in an interesting way. As a purely mathematical question--if someone teasingly gives you a few quantiles of a mathematical distribution--this is without statistical interest and belongs on the math site. But if these quantiles are measured in a dataset, then generally they will not exactly correspond to the quantiles of any gamma distribution and we need to find the "best" fit in some sense. – whuber Jun 11 '12 at 12:19
  • 1
    So, after that long introductory comment, which situation are you in, Alexx? Should we send your question over to the math people for a theoretical answer, or are these quantiles derived from data? If the latter, then could you help us understand what a "good" (or a "best") solution would look like? E.g., should the fitted distribution match some of the quantiles better than some of the others when a perfect fit is not possible? – whuber Jun 11 '12 at 12:20
  • But actually the second answer (by @mpiktas) in the link you posted estimates the distribution even if your quantiles are not exact (derived from the data). – Dmitry Laptev Jun 11 '12 at 12:35
  • This definitely a different question as it is specific to the gamma rather than the lognormal. Using the data to fit a model based on quantiles or order statistics seems to me to be a reasonable thing to do even though not efficient. if the situation is like I mentioned with the Gupta reference where all the order statistics are available you can get reasonale parameter estimates. If it is just solving two equations in two unknowns based on a couple of quantiles then the estimates will be very poor as pointed out by Huber and others in the other stackexchange question cited here by whuber. – Michael R. Chernick Jun 11 '12 at 12:37
  • @John The solution by mpiktas uses quadratic loss. I am suggesting that this loss function might not be appropriate generally. Michael, the question of fitting a distribution given percentiles is essentially the same regardless of its formula; e.g., the method given by mpiktas in the related question will work as well for the gamma as for the lognormal, mutatis mutandis. – whuber Jun 11 '12 at 12:41
  • whuber, thanks for your request for clarification. I forgot to add that I meant the empirical quantiles of a data set. The question has been edited. A "good" fit would just be one with as little discrepancy as possible, but I don't have a specific loss function in mind, so maybe we should just use the "default" squared loss or something? I do not have specifics in mind – Alexander Engelhardt Jun 11 '12 at 13:01
  • Using the quadratic loss function you get a good fit. The estimators I obtained using this are $(\hat\alpha,\hat\beta)=(3.097, 0.244)$. –  Jun 11 '12 at 13:14
  • @Procrastinator Quadratic loss is usually not a good choice. Consider the error structure of the percentiles (under iid sampling): for gammas, there is much greater variation at the upper tail than in the middle. Moreover, errors are correlated. Thus least squares, although it works, may be far from optimal when relatively extreme percentiles are involved. – whuber Jun 11 '12 at 13:22
  • @whuber I agree, I just mentioned the empirical fit I observed not about optimality. What loss function would you recommend to get a better result? Do you know of any theoretical result about the relationship between the choice of the loss function and the rate of convergence? –  Jun 11 '12 at 13:25
  • Rate of convergence of what, @Proc? I think the key issue is getting an appropriate fit. The percentiles themselves have well-known distributions conditional on the underlying distribution itself, so one might approach this with a crude initial estimate of parameters (e.g., least squares!) followed by a generalized least squares re-estimate of the parameters. When there are more percentiles than parameters to estimate, and those percentiles cover a wide range (e.g., they're not 90-91-92-93), it seems likely the LS and GLS solutions will be easy to obtain and numerically stable. – whuber Jun 11 '12 at 13:30
  • @whuber I meant rate of convergence of the estimators (if possible under a certain choice of the loss function). What do you mean by 'optimal'? –  Jun 11 '12 at 13:33
  • Oh this wonderful smell of oil and metal... a new wheel being reinvented. Econometricians have proposed the generalized method of moments three decades ago, see http://www.citeulike.org/user/ctacmo/article/1155588 (attribution to the work of Ferguson in the 1950s is made in the paper, I believe). This methodology must be a part of an asymptotics class if math statisticians were not so snobby about the possibility of wonderful methods emerge in other disciplines; econometricians teach their students with GMM as an all-encompassing principle rather than the likelihood. – StasK Jun 11 '12 at 13:40
  • @Proc The question of optimality is one I earlier addressed to the OP, who demurred, leaving it up to us to consider what makes sense in general. Originally, I imagined a situation in which the purpose of the fitting might be to make an estimate, in which case "optimal" can take on its usual meanings for a statistical estimation problem. Thus, for instance, an optimal fit when the ultimate estimator is a 90th percentile would heavily weight the upper percentiles in the data, whereas an optimal fit when the mean is the estimand would likely weight the data very differently. – whuber Jun 11 '12 at 13:40
  • 1
    @Stas What does this problem have to do with GMM? I don't see any moments in evidence! – whuber Jun 11 '12 at 13:41
  • 1
    "Moments" is a bad name they got stuck with, admittedly. The method in fact works with estimating equations, and I hope you do see some in this example, @whuber. To rephrase, the GMM theory covers anything that can be done with the quadratic loss for estimating equations, including higher order asymptotics and weird dependencies between observations or equations. – StasK Jun 11 '12 at 14:16
  • @StasK - Gee, I had exposure to GMM in my two semester math-stats class taught by Bickel many years ago :) And one can't claim that that particular "grandson" of Neyman through Lehmann isn't a math statistician... – jbowman Jun 11 '12 at 14:21
  • Thank you for the clarification, @Stas. I was not aware of the generality of GMM. – whuber Jun 11 '12 at 15:01

1 Answers1

3

i don't know what was in the other post but I have a response. One can look at the order statistics which represent specific quantiles of the distribution namely, the $k$'th order statistic, $X_{(k)}$, is an estimate of the $100 \cdot k/n$'th quantile of the distribution. There is a famous paper in Technometrics 1960 by Shanti Gupta that shows how to estimate the shape parameter of a gamma distribution using the order statistics. See this link: http://www.jstor.org/discover/10.2307/1266548

Macro
  • 44,826
  • I TeXed one part of your answer (leaving the content identical) but I'm a little confused and think there may be a typo or something. Re: "One can look at the order statistics which represent specific quantiles of the distribution.....". Do you mean quantiles of the empirical distribution? Also, the $k$'th order statistic usually refers to the $k$'th smallest value, not the $k/n$'th quantile of the empirical distribution, right? Can you clarify (sorry if I'm being dense)? – Macro Jun 11 '12 at 13:11
  • If n is the sample size the kth order statistic represents an estimate of the 100 k/n percentile of the distribution being sampled. – Michael R. Chernick Jun 11 '12 at 13:15
  • @MichaelChernick, I've slightly edited your answer to make that clear - hopefully this looks ok. – Macro Jun 11 '12 at 13:21