Most Popular
1500 questions
242
votes
20 answers
Intuitive explanation for dividing by $n-1$ when calculating standard deviation?
I was asked today in class why you divide the sum of square error by $n-1$ instead of with $n$, when calculating the standard deviation.
I said I am not going to answer it in class (since I didn't wanna go into unbiased estimators), but later I…
Tal Galili
- 21,541
242
votes
4 answers
When (and why) should you take the log of a distribution (of numbers)?
Say I have some historical data e.g., past stock prices, airline ticket price fluctuations, past financial data of the company...
Now someone (or some formula) comes along and says "let's take/use the log of the distribution" and here's where I go…
PhD
- 14,627
241
votes
11 answers
Why is accuracy not the best measure for assessing classification models?
This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference.
Accuracy, the proportion of correct classifications among…
Tim
- 138,066
241
votes
14 answers
How should I transform non-negative data including zeros?
If I have highly skewed positive data I often take logs. But what should I do with highly skewed non-negative data that include zeros? I have seen two transformations used:
$\log(x+1)$ which has the neat feature that 0 maps to 0.
$\log(x+c)$ where…
Rob Hyndman
- 56,782
232
votes
8 answers
What are the advantages of ReLU over sigmoid function in deep neural networks?
The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages?
I know that training a network when ReLU is used would be faster, and it is more biological…
RockTheStar
- 12,907
- 34
- 71
- 96
231
votes
5 answers
Which "mean" to use and when?
So we have arithmetic mean (AM), geometric mean (GM) and harmonic mean (HM). Their mathematical formulation is also well known along with their associated stereotypical examples (e.g., Harmonic mean and it's application to 'speed' related…
PhD
- 14,627
229
votes
3 answers
R's lmer cheat sheet
There's a lot of discussion going on on this forum about the proper way to specify various hierarchical models using lmer.
I thought it would be great to have all the information in one place.
A couple of questions to start:
How to specify multiple…
DBR
223
votes
13 answers
What is the difference between data mining, statistics, machine learning and AI?
What is the difference between data mining, statistics, machine learning and AI?
Would it be accurate to say that they are 4 fields attempting to solve very similar problems but with different approaches? What exactly do they have in common and…
Olivier Lalonde
- 141
222
votes
8 answers
In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?
Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?
d_2
- 2,381
221
votes
6 answers
Can principal component analysis be applied to datasets containing a mix of continuous and categorical variables?
I have a dataset that has both continuous and categorical data. I am analyzing by using PCA and am wondering if it is fine to include the categorical variables as a part of the analysis. My understanding is that PCA can only be applied to continuous…
Nikolina Icitovic
- 2,211
216
votes
5 answers
How exactly does one “control for other variables”?
Here is the article that motivated this question: Does impatience make us fat?
I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, age, etc) in order to best isolate the true…
JackOfAll
- 2,977
212
votes
4 answers
What does the hidden layer in a neural network compute?
I'm sure many people will respond with links to 'let me google that for you', so I want to say that I've tried to figure this out so please forgive my lack of understanding here, but I cannot figure out how the practical implementation of a neural…
FAtBalloon
- 2,257
211
votes
7 answers
PCA on correlation or covariance?
What are the main differences between performing principal component analysis (PCA) on the correlation matrix and on the covariance matrix? Do they give the same results?
Random
- 2,290
211
votes
8 answers
What does 1x1 convolution mean in a neural network?
I am currently doing the Udacity Deep Learning Tutorial. In Lesson 3, they talk about a 1x1 convolution. This 1x1 convolution is used in Google Inception Module. I'm having trouble understanding what is a 1x1 convolution.
I have also seen this post…
jkschin
- 2,233
- 3
- 11
- 7
211
votes
9 answers
How to deal with perfect separation in logistic regression?
If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
We…
user333
- 7,211