Most Popular

1500 questions
143
votes
4 answers

Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian?

Somebody asked me this question in a job interview and I replied that their joint distribution is always Gaussian. I thought that I can always write a bivariate Gaussian with their means and variance and covariances. I am wondering if there can be a…
MarkSAlen
  • 2,927
142
votes
9 answers

Is Facebook coming to an end?

Recently, this paper has received a lot of attention (e.g. from WSJ). Basically, the authors conclude that Facebook will lose 80% of its members by 2017. They base their claims on an extrapolation of the SIR model, a compartmental model frequently…
142
votes
8 answers

Why does the Cauchy distribution have no mean?

From the distribution density function we could identify a mean (=0) for Cauchy distribution just like the graph below shows. But why do we say Cauchy distribution has no mean?
Flying pig
  • 6,239
141
votes
9 answers

Obtaining knowledge from a random forest

Random forests are considered to be black boxes, but recently I was thinking what knowledge can be obtained from a random forest? The most obvious thing is the importance of the variables, in the simplest variant it can be done just by calculating…
140
votes
3 answers

What if residuals are normally distributed, but y is not?

I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would…
MarkDollar
  • 5,955
140
votes
4 answers

What is the difference between convolutional neural networks, restricted Boltzmann machines, and auto-encoders?

Recently I have been reading about deep learning and I am confused about the terms (or say technologies). What is the difference between Convolutional neural networks (CNN), Restricted Boltzmann machines (RBM) and Auto-encoders?
RockTheStar
  • 12,907
  • 34
  • 71
  • 96
139
votes
10 answers

Bias and variance in leave-one-out vs K-fold cross validation

How do different cross-validation methods compare in terms of model variance and bias? My question is partly motivated by this thread: Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?. The answer…
139
votes
3 answers

What is the difference between linear regression and logistic regression?

What is the difference between linear regression and logistic regression? When would you use each?
B Seven
  • 2,913
139
votes
6 answers

How is it possible that validation loss is increasing while validation accuracy is increasing as well

I am training a simple neural network on the CIFAR10 dataset. After some time, validation loss started to increase, whereas validation accuracy is also increasing. The test loss and test accuracy continue to improve. How is this possible? It seems…
139
votes
8 answers

How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples

Certain hypotheses can be tested using Student's t-test (maybe using Welch's correction for unequal variances in the two-sample case), or by a non-parametric test like the Wilcoxon paired signed rank test, the Wilcoxon-Mann-Whitney U test, or the…
Silverfish
  • 23,353
  • 27
  • 103
  • 201
138
votes
8 answers

Is it necessary to scale the target value in addition to scaling features for regression analysis?

I'm building regression models. As a preprocessing step, I scale my feature values to have mean 0 and standard deviation 1. Is it necessary to normalize the target values also?
user2806363
  • 2,723
137
votes
4 answers

Nested cross validation for model selection

How can one use nested cross validation for model selection? From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search (e.g. running K-fold for every available model, e.g. combination of…
136
votes
14 answers

What's wrong with XKCD's Frequentists vs. Bayesians comic?

This xkcd comic (Frequentists vs. Bayesians) makes fun of a frequentist statistician who derives an obviously wrong result. However it seems to me that his reasoning is actually correct in the sense that it follows the standard frequentist…
repied2
  • 1,667
136
votes
7 answers

Is there an intuitive interpretation of $A^TA$ for a data matrix $A$?

For a given data matrix $A$ (with variables in columns and data points in rows), it seems like $A^TA$ plays an important role in statistics. For example, it is an important part of the analytical solution of ordinary least squares. Or, for PCA, its…
Alec
  • 2,385
134
votes
5 answers

How does a Support Vector Machine (SVM) work?

How does a Support Vector Machine (SVM) work, and what differentiates it from other linear classifiers, such as the Linear Perceptron, Linear Discriminant Analysis, or Logistic Regression? * (* I'm thinking in terms of the underlying motivations for…
tdc
  • 7,569