3

What is the best way to show that when sampling from a normal distribution, the sample mean and sample variance are independent? I know the theory behind this result, I would like to show it using a simulation in R. For now what I did is:

R<-1000
n<-10
mu<-5
sd<-3

mc<-s2<-vector(mode="numeric",length=R)

for(i in 1:R) { x<-rnorm(n,mu,sd) mc[i]<-mean(x) s2[i]<-var(x) } plot(mc,s2)

But i know this is not enough to prove independence. Is there any better way? More in general, i would like to know how to show in R that two random variables are independent. I hope this is not a stupid question, I just started learning R recently.

Carlos233
  • 31
  • 1
  • I think what you have is perfect! We can see from the plot that the sample mean and sample variance have no relationship. If you really want something quantitative, independence would be probably require some kind of hypothesis test, but showing that they have zero correlation should be easy. – John Madden Feb 24 '24 at 20:49
  • 1
    @JohnMadden: Good, but not quite perfect. He forgot to use set.seed to give a replicable analysis. – Ben Feb 24 '24 at 20:59
  • 1
    There is no way to "prove independence" by examining a sample. You can make plots, as you have done, but those don't "prove" anything in the usual sense of the word. – jbowman Feb 24 '24 at 20:59
  • To illustrate independence, a versatile and insightful technique is to construct a wandering schematic plot based on the scatterplot of the two variables. The idea is that the changes in the conditional distribution of one variable (based on the other) should be attributable solely to sampling error. You can find R code to do this at https://stats.stackexchange.com/a/106083/919. – whuber Feb 24 '24 at 22:52
  • Very interesting. I’d also suggest the use of Hoeffding’s $d$ which is a general measure of dependence, implemented in the R Hmisc package hoeffd function. – Frank Harrell Feb 25 '24 at 12:51
  • You won't prove independence. But you can look for dependence by for example taking thin slices across the bivariate distribution and seeing whether the distributional shapes are similar, as well as their mean and spread. For this purpose, particularly with small n I'd suggest looking at variance$^\frac13$ (which will be approximately normal), since comparisons of the tail behavior of skewed distributions can be more difficult. Since the sample mean will be symmetric, for the variance conditional on the mean it should suffice to take a slice near the center and a few further from the center. – Glen_b Feb 25 '24 at 23:17

1 Answers1

4

But i know this is not enough to prove independence.

To prove independence, you need a mathematical result. Simulation is an empirical tool, and you will not be able to prove the result with any finite amount of simulations.

That being said, your code looks fine (if not a bit in need of a linter). You could simulate your simulations and estimate the correlation between samples. Here is an example

n <- 100
nsims <- 1000

rhos <- replicate(nsims, {

sims <- replicate(nsims, {

x &lt;- rnorm(n)
xbar &lt;- mean(x)
s2 &lt;- var(x)

c(xbar, s2)

})

rho <- cor(t(sims))[1, 2]

rho })

hist(rho)

enter image description here

Here is what this code does:

  • The inner loops simulates drawing 100 draws from a standard normal distribution. The sample mean and variance are computed.
  • The correlation between the two is measured
  • Steps 1 and 2 are repeated 1000 times
  • This produces 1000 correlation coefficients, derived from the procedure described.

The histogram shows that these correlation coefficients cluster around 0, indicating that the two may be uncorrelated. However, this is just empirical evidence and as I mentioned, definitive evidence would require a mathematical argument.