Most Popular

1500 questions
89
votes
9 answers

Mathematician wants the equivalent knowledge to a quality stats degree

I know people love to close duplicates so I am not asking for a reference to start learning statistics (as here). I have a doctorate in mathematics but never learned statistics. What is the shortest route to the equivalent knowledge to a top notch…
89
votes
14 answers

What is the meaning of "All models are wrong, but some are useful"

"Essentially, all models are wrong, but some are useful." --- Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. ISBN 0471810339. What exactly is the meaning of the above phrase?
gpuguy
  • 1,123
89
votes
1 answer

Variance of product of multiple independent random variables

We know the answer for two independent variables: $$ {\rm Var}(XY) = E(X^2Y^2) − (E(XY))^2={\rm Var}(X){\rm Var}(Y)+{\rm Var}(X)(E(Y))^2+{\rm Var}(Y)(E(X))^2$$ However, if we take the product of more than two variables, ${\rm Var}(X_1X_2 \cdots…
damla
  • 1,041
  • 1
  • 9
  • 5
89
votes
5 answers

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy. I have a good intuition about the…
kedarps
  • 3,542
89
votes
4 answers

How to produce a pretty plot of the results of k-means cluster analysis?

I'm using R to do K-means clustering. I'm using 14 variables to run K-means What is a pretty way to plot the results of K-means? Are there any existing implementations? Does having 14 variables complicate plotting the results? I found something…
JEquihua
  • 3,835
89
votes
24 answers

Rules of thumb for "modern" statistics

I like G van Belle's book on Statistical Rules of Thumb, and to a lesser extent Common Errors in Statistics (and How to Avoid Them) from Phillip I Good and James W. Hardin. They address common pitfalls when interpreting results from experimental and…
chl
  • 53,725
89
votes
7 answers

What are the 'big problems' in statistics?

Mathematics has its famous Millennium Problems (and, historically, Hilbert's 23), questions that helped to shape the direction of the field. I have little idea, though, what the Riemann Hypotheses and P vs. NP's of statistics would be. So, what are…
raegtin
  • 9,930
88
votes
5 answers

Central limit theorem for sample medians

If I calculate the median of a sufficiently large number of observations drawn from the same distribution, does the central limit theorem state that the distribution of medians will approximate a normal distribution? My understanding is that this is…
user1728853
  • 1,077
88
votes
5 answers

Intuition on the Kullback–Leibler (KL) Divergence

I have learned about the intuition behind the KL Divergence as how much a model distribution function differs from the theoretical/true distribution of the data. The source I am reading goes on to say that the intuitive understanding of 'distance'…
cgo
  • 9,107
88
votes
4 answers

What's so 'moment' about 'moments' of a probability distribution?

I KNOW what moments are and how to calculate them and how to use the moment generating function for getting higher order moments. Yes, I know the math. Now that I need to get my statistics knowledge lubricated for work, I thought I might as well ask…
PhD
  • 14,627
88
votes
3 answers

What is the lasso in regression analysis?

I'm looking for a non-technical definition of the lasso and what it is used for.
Paul Vogt
  • 881
88
votes
14 answers

Why haven't robust (and resistant) statistics replaced classical techniques?

When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so you never actually know. For instance, that so…
doug
  • 10,549
  • 1
  • 26
  • 26
87
votes
4 answers

How to visualize what canonical correlation analysis does (in comparison to what principal component analysis does)?

Canonical correlation analysis (CCA) is a technique related to principal component analysis (PCA). While it is easy to teach PCA or linear regression using a scatter plot (see a few thousand examples on google image search), I have not seen a…
figure
  • 973
  • 2
  • 7
  • 6
87
votes
5 answers

What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?

What is the practical difference between Wasserstein metric and Kullback-Leibler divergence? Wasserstein metric is also referred to as Earth mover's distance. From Wikipedia: Wasserstein (or Vaserstein) metric is a distance function defined between…
87
votes
2 answers

Likelihood ratio vs Bayes Factor

I'm rather evangelistic with regards to the use of likelihood ratios for representing the objective evidence for/against a given phenomenon. However, I recently learned that the Bayes factor serves a similar function in the context of Bayesian…
Mike Lawrence
  • 13,793