Most Popular
1500 questions
127
votes
18 answers
Including the interaction but not the main effects in a model
Is it ever valid to include a two-way interaction in a model without including the main effects? What if your hypothesis is only about the interaction, do you still need to include the main effects?
Glen
- 7,250
126
votes
1 answer
What is an ablation study? And is there a systematic way to perform it?
What is an ablation study? And is there a systematic way to perform it? For example, I have $n$ predictors in a linear regression which I will call as my model.
How will I perform an ablation study to this? What metrics should I use?
A…
cgo
- 9,107
125
votes
11 answers
Calculating optimal number of bins in a histogram
I'm interested in finding as optimal of a method as I can for determining how many bins I should use in a histogram. My data should range from 30 to 350 objects at most, and in particular I'm trying to apply thresholding (like Otsu's method) where…
Tony Stark
- 1,353
- 2
- 9
- 5
124
votes
3 answers
Does an unbalanced sample matter when doing logistic regression?
Okay, so I think I have a decent enough sample, taking into account the 20:1 rule of thumb: a fairly large sample (N=374) for a total of 7 candidate predictor variables.
My problem is the following: whatever set of predictor variables I use, the…
Michiel
- 1,343
124
votes
3 answers
Intuitive explanation of unit root
How would you explain intuitively what is a unit root, in the context of the unit root test?
I'm thinking in ways of explaining much like I've founded in this question.
The case with unit root is that I know (little, by the way) that the unit root…
Lucas Reis
- 2,062
124
votes
7 answers
Why use gradient descent for linear regression, when a closed-form math solution is available?
I am taking the Machine Learning courses online and learnt about Gradient Descent for calculating the optimal values in the hypothesis.
h(x) = B0 + B1X
why we need to use Gradient Descent if we can easily find the values with the below formula?…
Purus
- 1,343
122
votes
3 answers
tanh activation function vs sigmoid activation function
The tanh activation function is:
$$tanh \left( x \right) = 2 \cdot \sigma \left( 2 x \right) - 1$$
Where $\sigma(x)$, the sigmoid function, is defined as:
$$\sigma(x) = \frac{e^x}{1 + e^x}$$.
Questions:
Does it really matter between using those…
satya
- 1,373
121
votes
4 answers
Why does the Lasso provide Variable Selection?
I've been reading Elements of Statistical Learning, and I would like to know why the Lasso provides variable selection and ridge regression doesn't.
Both methods minimize the residual sum of squares and have a constraint on the possible values of…
Shiwen
- 1,422
121
votes
5 answers
How do you calculate precision and recall for multiclass classification using confusion matrix?
I wonder how to compute precision and recall using a confusion matrix for a multi-class classification problem. Specifically, an observation can only be assigned to its most probable class / label. I would like to compute:
Precision = TP / (TP+FP)…
daiyue
- 1,321
121
votes
21 answers
At each step of a limiting infinite process, put 10 balls in an urn and remove one at random. How many balls are left?
The question (slightly modified) goes as follows and if you have never encountered it before you can check it in example 6a, chapter 2, of Sheldon Ross' A First Course in Probability:
Suppose that we possess an infinitely large urn and an infinite
…
Carlos Cinelli
- 12,552
121
votes
5 answers
Comprehensive list of activation functions in neural networks with pros/cons
Are there any reference document(s) that give a comprehensive list of activation functions in neural networks along with their pros/cons (and ideally some pointers to publications where they were successful or not so successful)?
Franck Dernoncourt
- 46,817
- 33
- 176
- 288
120
votes
6 answers
Is it possible to train a neural network without backpropagation?
Many neural network books and tutorials spend a lot of time on the backpropagation algorithm, which is essentially a tool to compute the gradient.
Let's assume we are building a model with ~10K parameters / weights. Is it possible to run the…
Haitao Du
- 36,852
- 25
- 145
- 242
120
votes
9 answers
How does the reparameterization trick for VAEs work and why is it important?
How does the reparameterization trick for variational autoencoders (VAE) work? Is there an intuitive and easy explanation without simplifying the underlying math? And why do we need the 'trick'?
David Dao
- 2,824
119
votes
4 answers
Assessing approximate distribution of data based on a histogram
Suppose I want to see whether my data is exponential based on a histogram (i.e. skewed to the right).
Depending on how I group or bin the data, I can get wildly different histograms.
One set of histograms will make is seem that the data is…
guestoeijreor
- 1,191
119
votes
5 answers
Mean absolute error OR root mean squared error?
Why use Root Mean Squared Error (RMSE) instead of Mean Absolute Error (MAE)??
I've been investigating the error generated in a calculation - I initially calculated the error as a Root Mean Normalised Squared Error.
Looking a little closer, I see the…
user1665220
- 1,325