Most Popular
1500 questions
56
votes
2 answers
Determining sample size necessary for bootstrap method / Proposed Method
I know this is a rather hot topic where no one really can give a simple answer for. Nevertheless I am wondering if the following approach couldn’t be useful.
The bootstrap method is only useful if your sample follows more or less (read exactly) the…
siegfried
- 579
56
votes
9 answers
Bayesian vs frequentist Interpretations of Probability
Can someone give a good rundown of the differences between the Bayesian and the frequentist approach to probability?
From what I understand:
The frequentists view is that the data is a repeatable random sample (random variable) with a specific…
BYS2
- 1,505
- 2
- 15
- 20
56
votes
5 answers
What is the difference between the forward-backward and Viterbi algorithms?
I want to know what the differences between the forward-backward algorithm and the Viterbi algorithm for inference in hidden Markov models (HMM) are.
user34790
- 6,757
- 10
- 46
- 69
56
votes
7 answers
Best PCA algorithm for huge number of features (>10K)?
I previously asked this on StackOverflow, but it seems like it might be more appropriate here, given that it didn't get any answers on SO. It's kind of at the intersection between statistics and programming.
I need to write some code to do PCA…
dsimcha
- 8,739
56
votes
8 answers
Danger of setting all initial weights to zero in Backpropagation
Why is it dangerous to initialize weights with zeros? Is there any simple example that demonstrates it?
user8078
- 663
56
votes
11 answers
Deriving Bellman's Equation in Reinforcement Learning
I see the following equation in "In Reinforcement Learning. An Introduction", but don't quite follow the step I have highlighted in blue below. How exactly is this step derived?
Amelio Vazquez-Reina
- 19,346
56
votes
6 answers
Understanding LSTM units vs. cells
I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter.
From this very…
user124589
56
votes
3 answers
Regularization methods for logistic regression
Regularization using methods such as Ridge, Lasso, ElasticNet is quite common for linear regression. I wanted to know the following:
Are these methods applicable for logistic regression? If so, are there any differences in the way they need to be…
Tapan Khopkar
- 846
- 2
- 8
- 9
56
votes
9 answers
Modern successor to Exploratory Data Analysis by Tukey?
I've been reading Tukey's book "Exploratory Data Analysis". Being written in 1977, the book emphasizes paper/pencil methods. Is there a more 'modern' successor which takes into account that we can now instantaneosly plot large data sets?
biofreezer
- 315
56
votes
2 answers
Choosing the right linkage method for hierarchical clustering
I am performing hierarchical clustering on data I've gathered and processed from the reddit data dump on Google BigQuery.
My process is the following:
Get the latest 1000 posts in /r/politics
Gather all the comments
Process the data and compute an…
Kevbot
- 661
56
votes
3 answers
What is pre training a neural network?
Well the question says it all.
What is meant by "pre training a neural network"? Can someone explain in pure simple English?
I can't seem to find any resources related to it. It would be great if someone can point me to them.
Machina333
- 1,123
56
votes
2 answers
Intuitive explanations of differences between Gradient Boosting Trees (GBM) & Adaboost
I'm trying to understand the differences between GBM & Adaboost.
These are what I've understood so far:
There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models.
GBM and Adaboost…
Hee Kyung Yoon
- 697
56
votes
4 answers
If the t-test and the ANOVA for two groups are equivalent, why aren't their assumptions equivalent?
I'm sure I've got this completely wrapped round my head, but I just can't figure it out.
The t-test compares two normal distributions using the Z distribution. That's why there's an assumption of normality in the DATA.
ANOVA is equivalent to linear…
Chris Beeley
- 5,761
56
votes
4 answers
Replicating Stata's "robust" option in R
I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package and also the command lmrob from the package "robustbase". In both cases the results are quite different from the "robust"…
user56579
- 561
55
votes
3 answers
Why is polynomial regression considered a special case of multiple linear regression?
If polynomial regression models nonlinear relationships, how can it be considered a special case of multiple linear regression?
Wikipedia notes that "Although polynomial regression fits a nonlinear model to the data, as a statistical estimation…
gavinmh
- 1,095