2

I want to generate 200 samples from t-distribution with the degree of freedom=1 and sample size is 10 and in R

I use this code

set.seed(1234)
B <- matrix(rt(10*200, 1), 200)

But when I see the sample number 167 (B[167,]) I found this high number 602.1691029

And this strange thing in the t-distribution. What is wrong here?

2 Answers2

4

And this strange thing in the t-distribution.

No, it isn't, not with $1$ degree of freedom.

The tails in the Cauchy are so heavy even its mean is undefined (not finite). Very, very large deviations happen reasonably often -- the more values you generate the bigger the largest-magnitude value will tend to be; indeed with the Cauchy it grows roughly linearly with sample size (e.g. $\text{median}({\max_i}(|X_{i}|))$ increases approximately in proportion to $n$; with $2000$ standard Cauchy values the median of the distribution of the largest-magnitude one is over $1800$ and the median of the distribution of the second-largest-magnitude observation is over $750$).

Note that $P(|X|\geq 602)\approx 0.001$. If you generate $2000$ of them you expect roughly about $2$ of those observations to be at least that large in magnitude.

Rather than being surprised to see one of that size, you would often see even larger ones.

What is wrong here?

Nothing, this is typical. You might like to read more about the Cauchy and other t distributions with low d.f.

https://en.wikipedia.org/wiki/Cauchy_distribution

https://en.wikipedia.org/wiki/Student's_t-distribution

A number of posts on site here discuss interesting properties of the Cauchy ($t_1$) distribution.

Glen_b
  • 282,281
  • +1 This is an important rejoinder to the comments, all of which (at present) suggest $602$ is possible (or "feasible") but unusual. As you show, absolute values this large are to be expected in the sample. – whuber Sep 27 '21 at 13:15
  • (+1). Indeed, the median of the maximum of $n$ absolute values of a standard cauchy rv is $\tan(2^{-1-1/n}\pi)$ according to Mathematica. – COOLSerdash Sep 27 '21 at 13:48
  • Thanks. It should be possible to derive the exact value pretty directly from the transformed median of a beta (specifically one with second-shape parameter $\beta=1$ -- so it should be fairly doable once you account correctly for the $|X|$ part), but I figured the exact value was not needed. – Glen_b Sep 27 '21 at 16:28
  • 1
    If I recall correctly from the uniform distribution the median of the largest order statistic comes out to $\frac12^{1/n}$, so you would just take $F^{-1}$ of that. Which looks to correspond. That it then comes out almost proportional to $n$ isn't then hard to see. – Glen_b Sep 27 '21 at 16:44
1

Number 135 is even larger. I guess the problem comes from the fact you're using df = 1, which looks odd. Because the uncertainty is just too large with one observation..

You won't get these large numbers starting at e.g. df = 5.

F. Privé
  • 231
  • 7
    You could expand on this. df=1 represents a Cauchy distribution, which has an infinite mean (and thus this behaviour is entirely expected). – Ben Bolker Sep 26 '21 at 19:15
  • 2
    @BenBolker "infinite" or just "undefined"? – r2evans Sep 26 '21 at 19:18
  • Technically, undefined, I guess. It's all a mess. It would be nice to describe the Cauchy as having an "infinite variance" (which, naively, is it how it seems to behave), but that doesn't make sense since the variance is the variation around the (undefined) mean ... – Ben Bolker Sep 26 '21 at 19:25
  • 2
    @Ben An accurate and useful characterization is that the Cauchy distribution has an infinite absolute first moment. Therein lies the source of the difficulties. The variance need not be defined as variation around a mean, btw: it can be expressed in terms of the expectation of $(X-Y)^2$ where $(X,Y)$ are iid. However, when the mean is undefined, perforce the variance will be infinite. – whuber Sep 28 '21 at 13:29