Most Popular

1500 questions
60
votes
7 answers

Interview question: If correlation doesn't imply causation, how do you detect causation?

I got this question: If correlation doesn't imply causation, how do you detect causation? in an interview. My answer was: You do some form of A/B testing. The interviewer kept prodding me for another approach but I couldn't think of any, and he…
60
votes
3 answers

Where does the misconception that Y must be normally distributed come from?

Seemingly reputable sources claim that the dependent variable must be normally distributed: Model assumptions: $Y$ is normally distributed, errors are normally distributed, $e_i \sim N(0,\sigma^2)$, and independent, and $X$ is fixed, and …
colorlace
  • 1,060
  • 12
  • 25
60
votes
12 answers

Resources for learning Markov chain and hidden Markov models

I am looking for resources (tutorials, textbooks, webcast, etc) to learn about Markov Chain and HMMs. My background is as a biologist, and I'm currently involved in a bioinformatics-related project. Also, what are the necessary mathematical…
bow
  • 121
60
votes
2 answers

Explanation of min_child_weight in xgboost algorithm

The definition of the min_child_weight parameter in xgboost is given as the: minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than…
60
votes
5 answers

What is the difference between N and N-1 in calculating population variance?

I did not get the why there are N and N-1 while calculating population variance. When we use N and when we use N-1? Click here for a larger version It says that when population is very big there is no difference between N and N-1 but it does not…
ilhan
  • 984
60
votes
4 answers

Can a random forest be used for feature selection in multiple linear regression?

Since RF can handle non-linearity but can't provide coefficients, would it be wise to use random forest to gather the most important features and then plug those features into a multiple linear regression model in order to obtain their coefficients?…
60
votes
9 answers

Are we exaggerating importance of model assumption and evaluation in an era when analyses are often carried out by laymen

Bottom line, the more I learn about statistics, the less I trust published papers in my field; I simply believe that researchers are not doing their statistics well enough. I'm a layman, so to speak. I'm trained in biology but I have no formal…
60
votes
4 answers

How does linear regression use the normal distribution?

In linear regression, each predicted value is assumed to have been picked from a normal distribution of possible values. See below. But why is each predicted value assumed to have come from a normal distribution? How does linear regression use this…
luciano
  • 14,269
60
votes
2 answers

What is maxout in neural network?

Can anyone explain what maxout units in a neural network do? How do they perform and how do they differ from conventional units? I tried to read the 2013 "Maxout Network" paper by Goodfellow et al. (from Professor Yoshua Bengio's group), but I don't…
RockTheStar
  • 12,907
  • 34
  • 71
  • 96
60
votes
4 answers

What is perplexity?

I came across term perplexity which refers to the log-averaged inverse probability on unseen data. Wikipedia article on perplexity does not give an intuitive meaning for the same. This perplexity measure was used in pLSA paper. Can anyone explain…
Learner
  • 4,457
59
votes
3 answers

When combining p-values, why not just averaging?

I recently learned about Fisher's method to combine p-values. This is based on the fact that p-value under the null follows a uniform distribution, and that $$-2\sum_{i=1}^n{\log X_i} \sim \chi^2(2n), \text{ given } X \sim \text{Unif}(0,1)$$ which I…
Alby
  • 2,223
59
votes
2 answers

Linear kernel and non-linear kernel for support vector machine?

When using support vector machine, are there any guidelines on choosing linear kernel vs. nonlinear kernel, like RBF? I once heard that non-linear kernel tends not to perform well once the number of features is large. Are there any references on…
user3269
  • 5,152
  • 10
  • 46
  • 55
59
votes
5 answers

Generic sum of Gamma random variables

I have read that the sum of Gamma random variables with the same scale parameter is another Gamma random variable. I've also seen the paper by Moschopoulos describing a method for the summation of a general set of Gamma random variables. I have…
OSE
  • 1,217
59
votes
3 answers

Why does correlation matrix need to be positive semi-definite and what does it mean to be or not to be positive semi-definite?

I have been researching the meaning of positive semi-definite property of correlation or covariance matrices. I am looking for any information on Definition of positive semi-definiteness; Its important properties, practical implications; The…
Melon
  • 599
59
votes
3 answers

Standard deviation of standard deviation

What is an estimator of standard deviation of standard deviation if normality of data can be assumed?
user88