Most Popular

1500 questions
52
votes
3 answers

Logistic regression vs. LDA as two-class classifiers

I am trying to wrap my head around the statistical difference between Linear discriminant analysis and Logistic regression. Is my understanding right that, for a two class classification problem, LDA predicts two normal density functions (one for…
user1885116
  • 2,318
52
votes
3 answers

What is Deviance? (specifically in CART/rpart)

What is "Deviance," how is it calculated, and what are its uses in different fields in statistics? In particular, I'm personally interested in its uses in CART (and its implementation in rpart in R). I'm asking this since the wiki-article seems…
Tal Galili
  • 21,541
52
votes
2 answers

Random forest assumptions

I am kind of new to random forest so I am still struggling with some basic concepts. In linear regression, we assume independent observations, constant variance… What are the basic assumptions/hypothesis we make, when we use random forest? …
52
votes
9 answers

Statistical tests when sample size is 1

I'm a high school math teacher who is a bit stumped. A Biology student came to me with his experiment wanting to know what kind of statistical analysis he can do with his data (yes, he should have decided that BEFORE the experiment, but I wasn't…
52
votes
15 answers

A smaller dataset is better: Is this statement false in statistics? How to refute it properly?

Dr. Raoult, who promotes Hydroxychloroquine, has some really intriguing statement about statistics in the biomedical field: It's counterintuitive, but the smaller the sample size of a clinical test, the more significant its results are. The…
52
votes
6 answers

Motivation for Kolmogorov distance between distributions

There are many ways to measure how similar two probability distributions are. Among methods which are popular (in different circles) are: the Kolmogorov distance: the sup-distance between the distribution functions; the Kantorovich-Rubinstein…
Mark Meckes
  • 3,126
52
votes
8 answers

What is a good resource on table design?

I've seen various theoretical treatments of graphics, such as the Grammar of Graphics. But I have seen nothing equivalent with regards to tables. Over the while I have developed an informal model of good practice in table design. However, I'd like…
Jeromy Anglim
  • 44,984
52
votes
3 answers

Maximum Likelihood Estimators - Multivariate Gaussian

Context The Multivariate Gaussian appears frequently in Machine Learning and the following results are used in many ML books and courses without the derivations. Given data in form of a matrix $\mathbf{X} $ of dimensions $ m \times p$, if we…
52
votes
4 answers

When should I use a variational autoencoder as opposed to an autoencoder?

I understand the basic structure of variational autoencoder and normal (deterministic) autoencoder and the math behind them, but when and why would I prefer one type of autoencoder to the other? All I can think about is the prior distribution of…
DiveIntoML
  • 2,033
52
votes
4 answers

What is the rationale of the Matérn covariance function?

The Matérn covariance function is commonly used as kernel function in Gaussian Process. It is defined like this $$ {\displaystyle C_{\nu }(d)=\sigma ^{2}{\frac {2^{1-\nu }}{\Gamma (\nu )}}{\Bigg (}{\sqrt {2\nu }}{\frac {d}{\rho }}{\Bigg )}^{\nu…
52
votes
7 answers

Why does Andrew Ng prefer to use SVD and not EIG of covariance matrix to do PCA?

I am studying PCA from Andrew Ng's Coursera course and other materials. In the Stanford NLP course cs224n's first assignment, and in the lecture video from Andrew Ng, they do singular value decomposition instead of eigenvector decomposition of…
DongukJu
  • 663
52
votes
6 answers

How to perform a test using R to see if data follows normal distribution

I have a data set with following structure: a word | number of occurrence of a word in a document | a document id How can I perform a test for normal distribution in R? Probably it is an easy question but I am a R newbie.
Skarab
  • 987
52
votes
1 answer

Variational inference versus MCMC: when to choose one over the other?

I think I get the general idea of both VI and MCMC including the various flavors of MCMC like Gibbs sampling, Metropolis Hastings etc. This paper provides a wonderful exposition of both methods. I have the following questions: If I wish to do…
kedarps
  • 3,542
52
votes
4 answers

Cumming (2008) claims that distribution of p-values obtained in replications depends only on the original p-value. How can it be true?

I have been reading Geoff Cumming's 2008 paper Replication and $p$ Intervals: $p$ values predict the future only vaguely, but confidence intervals do much better [~200 citations in Google Scholar] -- and am confused by one of its central claims.…
amoeba
  • 104,745
52
votes
1 answer

How does the Adam method of stochastic gradient descent work?

I'm familiar with basic gradient descent algorithms for training neural networks. I've read the paper proposing Adam: ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION. While I've definitely got some insights (at least), the paper seems to be too high…
daniel451
  • 2,915