Most Popular
1500 questions
70
votes
6 answers
Warning in R - Chi-squared approximation may be incorrect
I have data showing fire fighter entrance exam results. I am testing the hypothesis that exam results and ethnicity are not mutually independent. To test this, I ran a Pearson chi-square test in R. The results show what I expected, but it gave a…
ferrelwill
- 815
70
votes
3 answers
Proper way of using recurrent neural network for time series analysis
Recurrent neural networks differ from "regular" ones by the fact that they have a "memory" layer. Due to this layer, recurrent NN's are supposed to be useful in time series modelling. However, I'm not sure I understand correctly how to use…
Boris Gorelik
- 2,707
70
votes
4 answers
What is the difference in Bayesian estimate and maximum likelihood estimate?
Please explain to me the difference in Bayesian estimate and Maximum likelihood estimate?
triomphe
- 867
70
votes
1 answer
Wald test for logistic regression
As far as I understand the Wald test in the context of logistic regression is used to determine whether a certain predictor variable $X$ is significant or not. It rejects the null hypothesis of the corresponding coefficient being zero.
The test…
user695652
- 1,591
- 3
- 17
- 23
70
votes
2 answers
What is the difference between a neural network and a deep belief network?
I am getting the impression that when people are referring to a 'deep belief' network that this is basically a neural network but very large. Is this correct or does a deep belief network also imply that the algorithm itself is different (ie, no…
Vincent Warmerdam
- 1,149
70
votes
2 answers
What is the relationship between independent component analysis and factor analysis?
I am new to Independent Component Analysis (ICA) and have just a rudimentary understanding of the the method. It seems to me that ICA is similar to Factor Analysis (FA) with one exception: ICA assumes that the observed random variables are a linear…
stats_student
- 833
70
votes
10 answers
Why is the sum of two random variables a convolution?
For long time I did not understand why the "sum" of two random variables is their convolution, whereas a mixture density function sum of $f(x)$ and $g(x)$ is $p\,f(x)+(1-p)g(x)$; the arithmetic sum and not their convolution. The exact phrase "the…
Carl
- 13,084
70
votes
6 answers
Is ridge regression useless in high dimensions ($n \ll p$)? How can OLS fail to overfit?
Consider a good old regression problem with $p$ predictors and sample size $n$. The usual wisdom is that OLS estimator will overfit and will generally be outperformed by the ridge regression estimator: $$\hat\beta = (X^\top X + \lambda I)^{-1}X^\top…
amoeba
- 104,745
70
votes
3 answers
When are Log scales appropriate?
I've read that using log scales when charting/graphing is appropriate in certain circumstances, like the y-axis in a time series chart. However, I've not been able to find a definitive explanation as to why that's the case, or when else it would be…
dav
- 1,551
70
votes
2 answers
What is the relationship between a chi squared test and test of equal proportions?
Suppose that I have three populations with four, mutually exclusive characteristics. I take random samples from each population and construct a crosstab or frequency table for the characteristics that I am measuring. Am I correct in saying…
hgcrpd
- 1,427
70
votes
7 answers
Is chi-squared always a one-sided test?
A published article (pdf) contains these 2 sentences:
Moreover, misreporting may be caused by the application of incorrect rules or by a lack of knowledge of the statistical test. For example, the total df in an ANOVA may be taken to be the error…
Joel W.
- 3,306
70
votes
9 answers
How can I help ensure testing data does not leak into training data?
Suppose we have someone building a predictive model, but that someone is not necessarily well-versed in proper statistical or machine learning principles. Maybe we are helping that person as they are learning, or maybe that person is using some…
Michael McGowan
- 4,761
69
votes
6 answers
Why on average does each bootstrap sample contain roughly two thirds of observations?
I have run across the assertion that each bootstrap sample (or bagged tree) will contain on average approximately $2/3$ of the observations.
I understand that the chance of not being selected in any of $n$ draws from $n$ samples with replacement is…
xyzzy
- 983
- 2
- 8
- 7
69
votes
4 answers
How to derive variance-covariance matrix of coefficients in linear regression
I am reading a book on linear regression and have some trouble understanding the variance-covariance matrix of $\mathbf{b}$:
The diagonal items are easy enough, but the off-diagonal ones are a bit more difficult, what puzzles me is that…
qed
- 2,808
69
votes
9 answers
How to visualize what ANOVA does?
What way (ways?) is there to visually explain what is ANOVA?
Any references, link(s) (R packages?) will be welcomed.
Tal Galili
- 21,541