Most Popular

1500 questions
56
votes
2 answers

Determining sample size necessary for bootstrap method / Proposed Method

I know this is a rather hot topic where no one really can give a simple answer for. Nevertheless I am wondering if the following approach couldn’t be useful. The bootstrap method is only useful if your sample follows more or less (read exactly) the…
siegfried
  • 579
56
votes
9 answers

Bayesian vs frequentist Interpretations of Probability

Can someone give a good rundown of the differences between the Bayesian and the frequentist approach to probability? From what I understand: The frequentists view is that the data is a repeatable random sample (random variable) with a specific…
BYS2
  • 1,505
  • 2
  • 15
  • 20
56
votes
5 answers

What is the difference between the forward-backward and Viterbi algorithms?

I want to know what the differences between the forward-backward algorithm and the Viterbi algorithm for inference in hidden Markov models (HMM) are.
user34790
  • 6,757
  • 10
  • 46
  • 69
56
votes
7 answers

Best PCA algorithm for huge number of features (>10K)?

I previously asked this on StackOverflow, but it seems like it might be more appropriate here, given that it didn't get any answers on SO. It's kind of at the intersection between statistics and programming. I need to write some code to do PCA…
dsimcha
  • 8,739
56
votes
8 answers

Danger of setting all initial weights to zero in Backpropagation

Why is it dangerous to initialize weights with zeros? Is there any simple example that demonstrates it?
user8078
  • 663
56
votes
11 answers

Deriving Bellman's Equation in Reinforcement Learning

I see the following equation in "In Reinforcement Learning. An Introduction", but don't quite follow the step I have highlighted in blue below. How exactly is this step derived?
56
votes
6 answers

Understanding LSTM units vs. cells

I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter. From this very…
user124589
56
votes
3 answers

Regularization methods for logistic regression

Regularization using methods such as Ridge, Lasso, ElasticNet is quite common for linear regression. I wanted to know the following: Are these methods applicable for logistic regression? If so, are there any differences in the way they need to be…
Tapan Khopkar
  • 846
  • 2
  • 8
  • 9
56
votes
9 answers

Modern successor to Exploratory Data Analysis by Tukey?

I've been reading Tukey's book "Exploratory Data Analysis". Being written in 1977, the book emphasizes paper/pencil methods. Is there a more 'modern' successor which takes into account that we can now instantaneosly plot large data sets?
56
votes
2 answers

Choosing the right linkage method for hierarchical clustering

I am performing hierarchical clustering on data I've gathered and processed from the reddit data dump on Google BigQuery. My process is the following: Get the latest 1000 posts in /r/politics Gather all the comments Process the data and compute an…
Kevbot
  • 661
56
votes
3 answers

What is pre training a neural network?

Well the question says it all. What is meant by "pre training a neural network"? Can someone explain in pure simple English? I can't seem to find any resources related to it. It would be great if someone can point me to them.
Machina333
  • 1,123
56
votes
2 answers

Intuitive explanations of differences between Gradient Boosting Trees (GBM) & Adaboost

I'm trying to understand the differences between GBM & Adaboost. These are what I've understood so far: There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models. GBM and Adaboost…
56
votes
4 answers

If the t-test and the ANOVA for two groups are equivalent, why aren't their assumptions equivalent?

I'm sure I've got this completely wrapped round my head, but I just can't figure it out. The t-test compares two normal distributions using the Z distribution. That's why there's an assumption of normality in the DATA. ANOVA is equivalent to linear…
Chris Beeley
  • 5,761
56
votes
4 answers

Replicating Stata's "robust" option in R

I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package and also the command lmrob from the package "robustbase". In both cases the results are quite different from the "robust"…
user56579
  • 561
55
votes
3 answers

Why is polynomial regression considered a special case of multiple linear regression?

If polynomial regression models nonlinear relationships, how can it be considered a special case of multiple linear regression? Wikipedia notes that "Although polynomial regression fits a nonlinear model to the data, as a statistical estimation…
gavinmh
  • 1,095