Relations between distribution free statistics and nonparametric statistics?

Question

Is a distribution free statistic necessarily a nonparametric statistic? Is there a distribution-free statistic that is not nonparametric?

Is a nonparametric statistic necessarily a distribution free statistic? Is there a nonparametric statistic that is not distribution-free?

Are distribution free statistics and nonparametric statistics (both viewed as measurable mappings of sample points) the same concept?

How is a nonparametric statistic defined? Is a nonparametric statistic defined as a statistic whose distribution doesn't depend on the distribution of the sample?

Thanks and regards!

This seems rather close to your earlier question. Is this the same question worded a different way, or an actually different issue? — Glen_b, Mar 10 '13 at 02:03
@Glen_b: related but not exactly the same. For example, I didn't ask if all distribution free statistics are nonparametric statistics there. — Tim, Mar 10 '13 at 02:18

AdamO · Answer 1 · 2013-03-11T19:21:57.343

I think the difference is that distribution free statistics use limit theorems to show that you can construct tests of the weak null hypothesis about some function of the distribution function that are asymptotically consistent and unbiased. I think there is some deficiency in the popular understanding of what constitutes "nonparametric" statistics, and that the statement of the null hypothesis eludes many applied statisticians.

For instance, if a 2-sample test has data drawn from a family of parametric probability models $X_1, X_2, \ldots, X_n \sim_{iid} \mathcal{F}_{\theta_1}$, $Y_1, Y_2, \ldots, Y_m \sim_{iid} \mathcal{F}_{\theta_2}$, then that parameter forms the obvious test of the weak null hypothesis: $\mathcal{H}_0: \theta_1 = \theta_2$. However, there might be a functional value which, independent of the distribution, requires an asymptotically correct test: so with two samples having no known class of probability models $X_1, X_2, \ldots, X_n \sim_{iid} \mathcal{F}$, $Y_1, Y_2, \ldots, Y_m \sim_{iid} \mathcal{G}$, you define a "parameter" for that distribution: $\theta_1 = f(\mathcal{F})$, $\theta_2 = f(\mathcal{G})$, and test the weak null hypothesis: $\mathcal{H}_0: \theta_1 = \theta_2$. In most cases, limit theorems such as the $\delta$-method, central limit theorem, etc. can show that there exists test statistics for which $n^k g \left( X, Y \right) \rightarrow_d \chi^2_p(\lambda)$ where the non-centrality parameter $\lambda$ is 0 only if the weak null hypothesis is true. There are usually some regularity conditions on the distributions $\mathcal{F}$, $\mathcal{G}$, but in general they can apply to a large array of probability models. The statistic $g()$ is what we call distribution free. Take the T-test: for normal, exponential, negative binomial, and poisson data, the T-test is well powered to detect differences in the means between these distributions for any two-sample experiment, even when the mean does not parametrize that distribution.

non-parametric statistics are traditionally used when researchers wish to state their strong null hypothesis as $\mathcal{H_0}: \mathcal{F} = \mathcal{G}$. That is, if there is a difference in the 99.99th percentile in these data (and even if there is a symmetric difference in the 00.01th percentile), they would like a test that is calibrated to reject the null hypothesis. We don't wish to prespecify what we would believe to be a meaningful difference in the distribution for these data, whether it be a mean, median, or quantile based summary. This is stupid in my mind. Tests like this can be done with Kolmogorov-Smirnoff tests for the empirical distribution function or using non-parametric smoothed kernel density estimates. The required sample size to detect the egregious, symmetric quantile difference above is enormous, but it is powered to reject the strong null hypothesis in that case.

Rank based statistics like the log-rank, Wilcoxon, Mann Whitney U, etc. are not tests of the strong null hypothesis, so in my mind they're not non-parametric.

+1 For a thoughtful, well-reasoned response. Would it be possible to provide some references to support the distinctions you make between "nonparametric" and "distribution-free," or are these distinctions your own understanding? — whuber, Mar 10 '13 at 15:05
I think this answer gives a much weaker version of what is normally considered a distribution-free statistic and I don't suspect most statisticians would characterize the $t$-test as distribution-free, even though it is asympotically so. A typical definition, at least in the classical literature, would entail that the distribution of the statistic be invariant to the underlying distribution of the data (with mild regularity conditions) for all finite sample size (even though the distribution of the statistic would, itself, vary with sample size). The KS statistic is one such example. — cardinal, Mar 10 '13 at 15:21
@cardinal Can you explain the distinction between the T-test and KS for finite samples and general probability models? I don't understand how the KS would be any less "distributionally dependent". — AdamO, Mar 10 '13 at 18:19
Hi @AdamO: For a quick start, you can look at this comment stream. — cardinal, Mar 10 '13 at 18:21
The limiting distribution of the KS statistic is a brownian bridge, but in finite samples the sampling distribution of the KS test statistic does depend on the underlying probability model as @whuber pointed out in the following comment. A similar case holds for the T-test. Both have known limiting distributions when the null hypothesis is true. Both have complicated finite sample sampling distributions. — AdamO, Mar 10 '13 at 18:27
No, AdamO. @whuber's comment was incorrect in this (rare!) instance, as he himself noted later in that comment stream. My third comment gives the definitive proof of this fact. The limiting distribution is that of a functional of a Brownian bridge, but (!) as pointed out in the comments, we know more! The finite-sample distribution is completely invariant to the distribution of the data: It will vary with $n$, but not with $F$. :-) — cardinal, Mar 10 '13 at 18:30
Gotcha. I gleaned over the algebra, but once I looked at it, I said, "oh!". — AdamO, Mar 10 '13 at 20:24
@cardinal: Thanks for your comment! We have the same definition for a distribution-free statistic. (1) I am not sure what your definition is for a nonparametric statistic? Is it a statistic that doesn't assume a parametric model on the distribution of the data? (2) Is there a distribution-free statistic that is not nonparametric? (3)Is there a nonparametric statistic that is not distribution-free? — Tim, Mar 12 '13 at 21:08

Relations between distribution free statistics and nonparametric statistics?

1 Answers1

Linked