1

I am only just starting my adventure with statistics, expected values, and variances etc. etc. and I still sometimes find them quite confusing. I am looking for software (like Wolfram that I used to learn proper differentiation and integration) that would enable me to check my work. The only thing I could find online though is the case when I give the calculator values and it calculates mean and variance, which is not really the thing I find troublesome. So basically, do you know of any software like that - such that I could feed it with a type of distribution of the random variable, the p.d.f. and the expression I'd like it to calculate $E[\mathrm{something}]$ or $Var[\mathrm{something}]$?

  • 2
    MathStatica has impressive capabilities in this regard. For examples, see many of the answers by @Wolfies, such as https://stats.stackexchange.com/a/78336/919. – whuber Dec 11 '20 at 22:13

1 Answers1

2

Analytic methods. Expected values and variances have specific formulas in terms of sums (for discrete distributions) and integrals (for continuous ones).

Examples:

(1) Consider a binomial distribution $\mathsf{Binom}(n, p),$ with the PDF $f(x) = {n\choose x}p^x(1-p)^{n-x},$ for $i=0,1,\dots,n,$ which can be manipulated to show that $E(X) = \sum_{x=0}^n xf(x) = np.$ Similarly, $Var(X) = np(1-p).$

(2) Consider the distribution $\mathsf{Exp}(\mathrm{rate} = \lambda),$ with density function $f(x) = \lambda e^{-\lambda x},$ for $x > 0,$ which manipulated to show (using integration by parts) that $E(X) = \int_0^\infty xf(x)\, dx = 1/\lambda.$ Similarly, $Var(X) = 1/\lambda^2.$

Computational methods. For specific numerical values of the parameters, one can use software to sum or to do numerical integration, or to simulate a sample from the population, in order to get the numerical value of the mean or variance. In practice, these methods would most often be used for distributions with PDFs that are more difficult to handle analytically.

However, here are some computations in R for the results above. In R, dbinom is a binomial PDF, and rbinom generates a random sample from a binomial population. Suppose $n = 30, p = 1/3.$

x = 0:30; pdf = dbinom(x, 30, 1/3)
mu = sum(x * pdf); mu
[1] 10              # np
sum((x-mu)^2 * pdf)
[1] 6.666667        # np(1-p)

For a million simulated observations one can expect about three significant digits of accuracy for $E(X)$ and about two digit accuracy for $Var(X),$ which has squared units.

set.seed(1112) # for reproducibility
x = rbinom(10^6, 30, 1/3)
mean(x)        # sample mean
[1] 9.9978     # aprx E(X) = 10
var(x)         # sample variance
[1] 6.648342   # aprx Var(X) = 20/3

(3) Suppose $X \sim \mathsf{Beta}(2, 3)$ with density function $f(x) = 12x(1-x)^2,$ for $0<x<1.$ Simple integration gives $E(X) = 2/5 = 0.4,$ $Var(X) = 1/25 = 0.04.$

For $E(X)$ an approximation of the Riemann integral $\int_0^1 xf(x)\, dx$ (by summing area of 10,000 rectangles) in R is as follows:

m = 10^4;  a = 0;  b = 1;  w = (b-a)/m
g = seq(a+w/2, b-w/2, len=m)
h = g * 12*g*(1-g)^2
sum(w * h)
[1] 0.4

More generally, there is an integrate procedure in R, which requires an appropriately defined function.

One simulation method:

set.seed(2020)
x = rbeta(10^6, 2, 3)
mean(x);  var(x)
[1] 0.3998891   # aprx E(X) = 0.4
[1] 0.03996535  # aprx Var(X) = 0.04

BruceET
  • 56,185
  • I appreciate your answer. I was just thinking about this sentence : "For a million simulated observations one can expect about three significant digits of accuracy". How can you tell that for $10^6$ observations, precision of mean is equal to 3 significant digits but 2 for the variance ? I am thinking of the weak law of large numbers but so then the accuracy of the prediction would be within a certain probabilistic range ? – outofthegreen Dec 12 '20 at 11:47
  • 1
    Long story about LLN. Quick rough version: At the end of the simulation for binomial mean(x) estimates $E(X)$ and the estimated margin of error for that is given by code 1.96*sd(x)/sqrt(10^6), which returns $ 0.005053738.$ – BruceET Dec 12 '20 at 16:48