Highest Voted Questions - Statistical Analysis Stack Exchange

127

votes

18 answers

Including the interaction but not the main effects in a model

Is it ever valid to include a two-way interaction in a model without including the main effects? What if your hypothesis is only about the interaction, do you still need to include the main effects?

asked May 20 '11 at 01:19

Glen

7,250

126

votes

1 answer

What is an ablation study? And is there a systematic way to perform it?

What is an ablation study? And is there a systematic way to perform it? For example, I have $n$ predictors in a linear regression which I will call as my model. How will I perform an ablation study to this? What metrics should I use? A…

asked Dec 03 '18 at 09:09

cgo

9,107

125

votes

11 answers

Calculating optimal number of bins in a histogram

I'm interested in finding as optimal of a method as I can for determining how many bins I should use in a histogram. My data should range from 30 to 350 objects at most, and in particular I'm trying to apply thresholding (like Otsu's method) where…

asked Jul 27 '10 at 15:21

Tony Stark

1,353
2
9
5

124

votes

3 answers

Does an unbalanced sample matter when doing logistic regression?

Okay, so I think I have a decent enough sample, taking into account the 20:1 rule of thumb: a fairly large sample (N=374) for a total of 7 candidate predictor variables. My problem is the following: whatever set of predictor variables I use, the…

asked Jan 07 '11 at 16:48

Michiel

1,343

124

votes

3 answers

Intuitive explanation of unit root

How would you explain intuitively what is a unit root, in the context of the unit root test? I'm thinking in ways of explaining much like I've founded in this question. The case with unit root is that I know (little, by the way) that the unit root…

asked May 24 '12 at 22:07

Lucas Reis

2,062

124

votes

7 answers

Why use gradient descent for linear regression, when a closed-form math solution is available?

I am taking the Machine Learning courses online and learnt about Gradient Descent for calculating the optimal values in the hypothesis. h(x) = B0 + B1X why we need to use Gradient Descent if we can easily find the values with the below formula?…

asked May 10 '17 at 16:52

Purus

1,343

122

votes

3 answers

tanh activation function vs sigmoid activation function

The tanh activation function is: $$tanh \left( x \right) = 2 \cdot \sigma \left( 2 x \right) - 1$$ Where $\sigma(x)$, the sigmoid function, is defined as: $$\sigma(x) = \frac{e^x}{1 + e^x}$$. Questions: Does it really matter between using those…

asked Jun 08 '14 at 06:11

satya

1,373

121

votes

4 answers

Why does the Lasso provide Variable Selection?

I've been reading Elements of Statistical Learning, and I would like to know why the Lasso provides variable selection and ridge regression doesn't. Both methods minimize the residual sum of squares and have a constraint on the possible values of…

asked Nov 04 '13 at 14:39

Shiwen

1,422

121

votes

5 answers

How do you calculate precision and recall for multiclass classification using confusion matrix?

I wonder how to compute precision and recall using a confusion matrix for a multi-class classification problem. Specifically, an observation can only be assigned to its most probable class / label. I would like to compute: Precision = TP / (TP+FP)…

asked Mar 04 '13 at 15:56

daiyue

1,321

121

votes

21 answers

At each step of a limiting infinite process, put 10 balls in an urn and remove one at random. How many balls are left?

The question (slightly modified) goes as follows and if you have never encountered it before you can check it in example 6a, chapter 2, of Sheldon Ross' A First Course in Probability: Suppose that we possess an infinitely large urn and an infinite …

asked Nov 24 '17 at 18:23

Carlos Cinelli

12,552

121

votes

5 answers

Comprehensive list of activation functions in neural networks with pros/cons

Are there any reference document(s) that give a comprehensive list of activation functions in neural networks along with their pros/cons (and ideally some pointers to publications where they were successful or not so successful)?

asked Sep 12 '14 at 13:28

Franck Dernoncourt

46,817
33
176
288

120

votes

6 answers

Is it possible to train a neural network without backpropagation?

Many neural network books and tutorials spend a lot of time on the backpropagation algorithm, which is essentially a tool to compute the gradient. Let's assume we are building a model with ~10K parameters / weights. Is it possible to run the…

asked Sep 20 '16 at 01:48

Haitao Du

36,852
25
145
242

120

votes

9 answers

How does the reparameterization trick for VAEs work and why is it important?

How does the reparameterization trick for variational autoencoders (VAE) work? Is there an intuitive and easy explanation without simplifying the underlying math? And why do we need the 'trick'?

asked Mar 02 '16 at 20:10

David Dao

2,824

119

votes

4 answers

Assessing approximate distribution of data based on a histogram

Suppose I want to see whether my data is exponential based on a histogram (i.e. skewed to the right). Depending on how I group or bin the data, I can get wildly different histograms. One set of histograms will make is seem that the data is…

asked Mar 08 '13 at 17:58

guestoeijreor

1,191

119

votes

5 answers

Mean absolute error OR root mean squared error?

Why use Root Mean Squared Error (RMSE) instead of Mean Absolute Error (MAE)?? I've been investigating the error generated in a calculation - I initially calculated the error as a Root Mean Normalised Squared Error. Looking a little closer, I see the…

asked Jan 22 '13 at 17:11

user1665220

1,325

Most Popular