Most Popular
1500 questions
80
votes
5 answers
Covariance and independence?
I read from my textbook that $\text{cov}(X,Y)=0$ does not guarantee X and Y are independent. But if they are independent, their covariance must be 0. I could not think of any proper example yet; could someone provide one?
Flying pig
- 6,239
79
votes
4 answers
Random Forest - How to handle overfitting
I have a computer science background but am trying to teach myself data science by solving problems on the internet.
I have been working on this problem for the last couple of weeks (approx 900 rows and 10 features). I was initially using logistic…
Abhi
- 1,409
78
votes
1 answer
KL divergence between two multivariate Gaussians
I'm having trouble deriving the KL divergence formula assuming two multivariate normal distributions. I've done the univariate case fairly easily. However, it's been quite a while since I took math stats, so I'm having some trouble extending it to…
dmartin
- 3,305
78
votes
9 answers
If A and B are correlated with C, why are A and B not necessarily correlated?
I know empirically that is the case. I have just developed models that run into this conundrum. I also suspect it is not necessarily a yes/no answer. I mean by that if both A and B are correlated with C, this may have some implication regarding…
Sympa
- 7,732
78
votes
5 answers
Understanding stratified cross-validation
I read in Wikipedia:
In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In
the case of a dichotomous classification, this means that each fold
contains roughly…
Amelio Vazquez-Reina
- 19,346
78
votes
7 answers
Where to cut a dendrogram?
Hierarchical clustering can be represented by a dendrogram. Cutting a dendrogram at a certain level gives a set of clusters. Cutting at another level gives another set of clusters. How would you pick where to cut the dendrogram? Is there something…
Eduardas
- 2,329
78
votes
6 answers
Why is multicollinearity not checked in modern statistics/machine learning
In traditional statistics, while building a model, we check for multicollinearity using methods such as estimates of the variance inflation factor (VIF), but in machine learning, we instead use regularization for feature selection and don't seem to…
user
- 781
78
votes
21 answers
Free resources for learning R
I'm interested in learning R on the cheap. What's the best free resource/book/tutorial for learning R?
Yahel
- 555
78
votes
16 answers
Practical thoughts on explanatory vs. predictive modeling
Back in April, I attended a talk at the UMD (University of Maryland) Math Department Statistics group seminar series called "To Explain or To Predict?". The talk was given by Prof. Galit Shmueli who teaches at UMD's Smith Business School. Her talk…
wahalulu
- 171
78
votes
4 answers
Why is sample standard deviation a biased estimator of $\sigma$?
According to the Wikipedia article on unbiased estimation of standard deviation the sample SD
$$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}$$
is a biased estimator of the SD of the population. It states that $E(\sqrt{s^2}) \neq…
Dav Weps
- 797
77
votes
1 answer
Impractical question: is it possible to find the regression line using a ruler and compass?
The ancient greeks famously sought to construct geometrical relationships using only a ruler and a compass. Given a set of points in a two dimensional plane, is it possible to find the OLS line using only such instruments?
This question has…
Pablo Derbez
- 818
77
votes
5 answers
What's the difference between momentum based gradient descent and Nesterov's accelerated gradient descent?
So momentum based gradient descent works as follows:
$v=\beta m-\eta g$
where $m$ is the previous weight update, and $g$ is the current gradient with respect to the parameters $p$, $\eta$ is the learning rate, and $\beta$ is a constant.
$p_{new} = p…
applecider
- 1,265
77
votes
5 answers
Why is ANOVA equivalent to linear regression?
I read that ANOVA and linear regression are the same thing. How can that be, considering that the output of ANOVA is some $F$ value and some $p$-value based on which you conclude if the sample means across the different samples are same or…
Victor
- 6,565
77
votes
15 answers
Why would parametric statistics ever be preferred over nonparametric?
Can someone explain to me why would anyone choose a parametric over a nonparametric statistical method for hypothesis testing or regression analysis?
In my mind, it's like going for rafting and choosing a non-water resistant watch, because you may…
en1
- 947
77
votes
4 answers
Maximum likelihood method vs. least squares method
What is the main difference between maximum likelihood estimation (MLE) vs. least squares estimaton (LSE) ?
Why can't we use MLE for predicting $y$ values in linear regression and vice versa?
Any help on this topic will be greatly appreciated.
evros
- 871