13

From Wikipedia

The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:

  • distribution free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric statistical models, inference and statistical tests.
  • non-parametric statistics (in the sense of a statistic over data, which is defined to be a function on a sample that has no dependency on a parameter), whose interpretation does not depend on the population fitting any parametrized distributions. Statistics based on the ranks of observations are one example of such statistics and these play a central role in many non-parametric approaches.

I can't see the difference between the two cases: distribution free methods, and non-parametric statistics. Do they both not assume the data coming from some distribution? How do they differ?

Thanks and regards!

Tim
  • 19,445
  • 1
    The definition you quote suggests the second is a subset of the first, but as they've actually defined them there (I'd swap about some parts of those definitions to the other term!) - and usually in practice - they seem to be used interchangeably. Nonparametric in this sense basically means 'infinite-parametric' while distribution-free methods are ones whose implementation and properties like null distributions don't depend on the distributional shape. Some books do make a distinction between the two; if I think of a reference I'll come back and add it. – Glen_b Feb 01 '13 at 00:17
  • @Glen_b: Thanks! Some references would be also appreciated! – Tim Feb 01 '13 at 13:25
  • @Glen_b: Why "the second is a subset of the first"? I feel the opposite. Could you let me know some references? Thanks! – Tim Mar 10 '13 at 00:44
  • "It includes non-parametric statistical models" is what gives that impression. References on definitions of the terms? Various books on distribution-free/nonparametric stats attempt definitions or distinctions; it's a long time since I read through a bunch of them, but standard books like Conover, Bradley, Daniel, Marascuilo & McSweeney, Lindley would be a start. Of those, I'd be inclined to check Bradley first. I only have Conover and Neave & Worthington to hand; I didn't spot a definition in either in a few minutes of looking - to my surprise; I though both would have something. – Glen_b Mar 10 '13 at 01:03
  • @Glen_b: Thanks! Do you think any of the two meanings for nonparametric statistics in the quote has something to do with distribution-free statistics? – Tim Mar 10 '13 at 01:23
  • Clearly. If there was any doubt, the final sentence confirms it – Glen_b Mar 10 '13 at 01:56
  • @Glen_b: The final sentence in the quote just gives an example for a statistic whose "interpretation does not depend on the population fitting any parametrized distributions". Is this statistic same as one whose distribution doesn't depend on the distribution of the sample? – Tim Mar 10 '13 at 02:16
  • is it here sufficient to refer you to the question you just asked? You seem to be all over the shop today. – Glen_b Mar 10 '13 at 02:19
  • @Glen_b: The rank statistic is just an example of such a nonparametric statistic. My question is if a nonparametric statistic in the second part is same as a statistic which is distribution free, i.e. whose distribution doesn't depend on the distribution of the sample? – Tim Mar 10 '13 at 04:28
  • The question I pointed to makes a point about rank based statistics and distribution-free statistics, does it not? The question I responded to with "the final sentence confirms it" os covered by that. An example is sufficient to answer that question. I don't understand the difficulty here. – Glen_b Mar 10 '13 at 05:03
  • @Glen_b: Rank statistic is an example of a nonparametric statistic. I was asking for general nonparametric statistics (not necessarily just rank statistics), which is also part of what I asked in a newer post http://stats.stackexchange.com/questions/51802/relations-between-distribution-free-statistics-and-nonparametric-statistics – Tim Mar 10 '13 at 05:06
  • @Glen_b: WHat are the names for the books by Bradley, Daniel, and Lindley? – Tim Mar 10 '13 at 05:24
  • Bradley: Distribution-Free Statistical Tests, Daniel: Applied Nonparametric Statistics, Lindley was a typo, unfortunately I'm not sure right now what book I had in mind when I wrote that. – Glen_b Jun 17 '21 at 00:12

2 Answers2

5

An illustrative example of the difference - comparing samples from two populations.

With the first definition you might still compare the means of the two populations, somehow using the samples to draw inferences (for example, by comparing sample means). The population means are parameters, but you make no assumptions about the distribution (eg you do not assume the population is normally distributed). So this is "distribution free" statistics. Me, I do not think this should be called part of non-parametric statistics - because of the obvious logical contradiction.

Under the second definition you do not consider at all a population mean or any other parameter. Instead you use methods such as comparisons of rankings. This is true non-parametric statistics.

Peter Ellis
  • 17,650
  • Thanks! In both cases, do the distributions of their statistics both not rely on the true distribution of the sample? – Tim Mar 10 '13 at 00:34
  • Do you agree with Glen_b that "the second is a subset of the first"? – Tim Mar 10 '13 at 00:43
  • Tim, I don't think the second is a subset of the first; please reread my comment and you'll see that's not at all what I said. I was describing what the thing you quoted appeared to be saying was the case. If I say "It looks like Bill thinks X", it doesn't imply "Glen_b thinks X". I may think nothing of the kind. – Glen_b Mar 10 '13 at 01:08
  • 1
    Irrespective of who (if anyone) thinks so, no, the second case is not a subset of the first. The second case explicitly excludes interest in parameters, which are the focus of the first. – Peter Ellis Mar 10 '13 at 01:55
  • The mean (expectation value) of a distribution is not necessarily a parameter. The concept of parameter refers to a family of distributions which is parametrized. A single distribution outside of the context of a family doesn't have parameters. It does however have properties (like the expectation value and the median) which can be estimated and be the subject of a null hypothesis test. – A. Donda Dec 04 '21 at 18:53
0

The text on Wikipedia has since been revised, and in my opinion makes more sense now. In particular, it quotes Kendall on a possible distinction between nonparametric and distribution free, which however has not been adopted:

The term "nonparametric statistics" has been imprecisely defined in the following two ways, among others.

  1. The first meaning of nonparametric covers techniques that do not rely on data belonging to any particular parametric family of probability distributions.

    These include, among others:

    • distribution free methods, which do not rely on assumptions that the data are drawn from a given parametric family of probability distributions. As such it is the opposite of parametric statistics. nonparametric statistics (a statistic is defined to be a function on a sample; no dependency on a parameter).

    Order statistics, which are based on the ranks of observations, is one example of such statistics.

    The following discussion is taken from Kendall's.[2]

    Statistical hypotheses concern the behavior of observable random variables.... For example, the hypothesis (a) that a normal distribution has a specified mean and variance is statistical; so is the hypothesis (b) that it has a given mean but unspecified variance; so is the hypothesis (c) that a distribution is of normal form with both mean and variance unspecified; finally, so is the hypothesis (d) that two unspecified continuous distributions are identical.

    It will have been noticed that in the examples (a) and (b) the distribution underlying the observations was taken to be of a certain form (the normal) and the hypothesis was concerned entirely with the value of one or both of its parameters. Such a hypothesis, for obvious reasons, is called parametric.

    Hypothesis (c) was of a different nature, as no parameter values are specified in the statement of the hypothesis; we might reasonably call such a hypothesis non-parametric. Hypothesis (d) is also non-parametric but, in addition, it does not even specify the underlying form of the distribution and may now be reasonably termed distribution-free. Notwithstanding these distinctions, the statistical literature now commonly applies the label "non-parametric" to test procedures that we have just termed "distribution-free", thereby losing a useful classification.

  2. The second meaning of non-parametric covers techniques that do not assume that the structure of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables are typically assumed to belong to parametric distributions, and assumptions about the types of connections among variables are also made. These techniques include, among others:

    • non-parametric regression, which is modeling whereby the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.
    • non-parametric hierarchical Bayesian models, such as models based on the Dirichlet process, which allow the number of latent variables to grow as necessary to fit the data, but where individual variables still follow parametric distributions and even the process controlling the rate of growth of latent variables follows a parametric distribution.
A. Donda
  • 3,199
  • 1
    A certain blindness is evident in characterizing "two unspecified continuous distributions are identical" as a distribution-free [sic] hypothesis. Continuity, although common, is an extremely restrictive condition on a distribution. I am truly puzzled by Kendall's remarks about hypothesis (c), which I believe the entire rest of the world would characterize as parametric (with two real parameters). – whuber Dec 04 '21 at 19:16
  • 1
    I think "continuous distributions" is here just an example, a step up in generalization from "normal form with both mean and variance unspecified". I agree with your second point though. – A. Donda Dec 04 '21 at 22:01