5

I have been presented a problem of this kind: suppose I know the values of k quantiles for a continuous random variable $X$

$$X_{1\%} = x_1, X_{5\%} = x_2, \dots , X_{99\%} = x_{k}$$

so that

$$ F_X(x_1)=1\%, F_X(x_2)=5\%, \dots, F_X(x_k)=99\% $$

From these informations I want to draw the chart of the PDF.

I thought that I could proceed this way:

  • interpolate $F_X(x)$ to get a smooth CDF (for instance spline interpolation)
  • find the derivative (numerical) of the smoothed CDF at some points to obtain the PDF.

Are there other more direct methods to address this problem? Do you think my solution is solid?

Thank you.

gioxc88
  • 1,200
  • 2
    If you can assume a parametric distribution, you can also fit the CDF function to the quantiles using black-box optimizer to find relevant parameters. – Tim May 24 '18 at 10:02
  • Thanks for the comment ... In my particular case I don't want to assume any parametric distribution. But just in case how doese the optimization work for example if I have more conditions than parameters to be estimated? For instance if I have 3 quantiles and I want to use optimizer to find $\mu$ and $\sigma$ of a normal distribution. – gioxc88 May 24 '18 at 10:15
  • 1
    Say you have $q_p = x$, then you seek such parameters $\theta$ to minimize the difference between $F_\theta(x)$ and $p$ as measured by some loss, using a black-box optimizer. – Tim May 24 '18 at 10:30
  • How many quantiles do you have? Are the true values known, or is there some kind of sampling or noise involved? – user20160 May 28 '18 at 08:57
  • I have 7 quantiles, but regardless, I would like to have an opinion about the procedure I described. Does it make sense? because I tried cubic spline interpolation I it does not work well – gioxc88 May 28 '18 at 09:23
  • The problem is underdetermined. Its focus on the PDF suggests selecting a solution in which the estimated PDF is as close to a true underlying PDF as possible. This indicates the question needs (at a minimum) two more criteria: (1) delineation of a set of possible distributions $F$ and (2) a formula to quantify the difference between two PDFs. I would like to suggest that you edit the question to supply this essential missing information. – whuber May 28 '18 at 13:01
  • @whuber thank you for your comment but those are all the information I got. Hence given that those are the only inputs you have at your disposal what would you do to get a PDF? I am not concerned with things like convergence. You have to know that most of the time people asking questions in the industry don't know what they are talking about and yet they demand answers – gioxc88 May 28 '18 at 15:13
  • I'm sure you have more information than that--unless this is just a textbook question. The application you have in mind, and what you know about the quantity represented by $F,$ will give you essential information. – whuber May 28 '18 at 17:35
  • In what sense "does not work well"? In general, 7 points seems indeed a bit of stretch... If you had more points you could try differentiation through a kernel smoother but that is out of the question with just 7 points. – usεr11852 May 28 '18 at 20:10
  • I am thinking something like: Estimating Derivatives for Samples of Sparsely Observed Functions, with Application to On-line Auction Dynamics (Link to pre-print) might be a guide (in their case the authors use local quadratic derivative estimators) but that necessitates a large sample of sparsely samples curves and not a single 7 points reading. – usεr11852 May 28 '18 at 20:19
  • In principle the approach is sound, so there seems to be some information missing. If there really is just the information as mentioned in the comments (please update the question), then you're stuck. Maybe you could give a little more background. – cherub May 29 '18 at 09:29

1 Answers1

6

Your approach is valid, if you use the integral of a cubic B-spline to interpolate the quantiles with the condition that all B-spline coefficients are nonnegative. https://en.m.wikipedia.org/wiki/B-spline This preserves the monotonicty of the quantiles in the interpolating function, while normal cubic splines do not.

There is no need then to numerically differentiate the spline, you get the PDF from the normal evaluation of the B-spline. The integral of the B-spline curve is also available without numerical computation. You get it from any serious library that implements splines, for example, in Python's scipy module.

The remaining question is how to find the B-spline coefficients in this case. I don't have perfect answer right now, although an efficient algorithm should be possible. A simple brute-force way would be to minimize the squared residuals between the integrated spline and the quantiles with a numerical minimizer, while constraining the coefficients to be nonnegative.

I can add a numerical example in a few days if desired.

olq_plo
  • 296
  • 1
  • 6
  • thank you very much for your answer. I use Matlab and my problem was just the one you mentioned of the negative coefficient. I will expand my original question in order to show you my results. Thanks for the time you spent looking into it!! – gioxc88 May 31 '18 at 07:40
  • 1
    Cool, I am glad I could help! Sorry for not providing a numerical example right away, but I am traveling and only have my phone at hand. – olq_plo May 31 '18 at 23:10