I have a scatter plot of (for example) height against age. How does one calculate for an individual point the percentile of the height for a given age?
Suggestions in R would be most appreciated. Thanks!
The dataset would have to be enormous for the empirical approaches to have sufficient precision, and it doesn't help very much to look at percentiles of the marginal distribution of height. I suggest quantile regression, allowing age to be flexibly modeled (e.g., using restricted cubic splines). Here is an example using R.
require(rms) # loads quantreg, Hmisc, SparseM packages too
dd <- datadist(mydata); options(datadist='dd')
f <- Rq(ht ~ rcs(age,5), tau=.25, data=mydata) # model 25th percentile
f
plot(Predict(f)) # shows confidence bands
nomogram(f) # make a nomogram to predict the quantile manually
'The' percentile for a given age implies some sort of regression (i.e. you can find 'the' mean predicted height from a given age).
Once you have found this, the result depends on your assumptions: if you want no assumptions (besides the regression's), find how many of the heights in your original data are smaller than the predicted one for your age (=use empirical distribution).
Otherwise, you can fit whatever model you like to the marginal distribution of height and then find the percentile of the predicted height in that distribution.