Most Popular
1500 questions
85
votes
2 answers
What is global max pooling layer and what is its advantage over maxpooling layer?
Can somebody explain what is a global max pooling layer and why and when do we use it for training a neural network. Do they have any advantage over ordinary max pooling layer?
Eka
- 2,251
85
votes
2 answers
What is a "kernel" in plain English?
There are several distinct usages:
kernel density estimation
kernel trick
kernel smoothing
Please explain what the "kernel" in them means, in plain English, in your own words.
Neil McGuigan
- 9,872
85
votes
6 answers
Why is it that natural log changes are percentage changes? What is about logs that makes this so?
Can somebody explain how the properties of logs make it so you can do log linear regressions where the coefficients are interpreted as percentage changes?
thewhitetie
- 1,057
- 1
- 8
- 7
85
votes
6 answers
Variable selection for predictive modeling really needed in 2016?
This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [3].
First, some context. Let's assume the goal…
horaceT
- 3,352
85
votes
14 answers
When (if ever) is a frequentist approach substantively better than a Bayesian?
Background: I do not have an formal training in Bayesian statistics (though I am very interested in learning more), but I know enough--I think--to get the gist of why many feel as though they are preferable to Frequentist statistics. Even the…
jsakaluk
- 5,514
- 1
- 23
- 47
84
votes
5 answers
What are good RMSE values?
Suppose I have some dataset. I perform some regression on it. I have a separate test dataset. I test the regression on this set. Find the RMSE on the test data. How should I conclude that my learning algorithm has done well, I mean what properties…
Shishir Pandey
- 1,101
84
votes
1 answer
Help me understand Support Vector Machines
I understand the basics of what a Support Vector Machines' aim is in terms of classifying an input set into several different classes, but what I don't understand is some of the nitty-gritty details. For starters, I'm a bit confused by the use of…
rohanbk
- 1,257
84
votes
10 answers
What are the major philosophical, methodological, and terminological differences between econometrics and other statistical fields?
Econometrics has substantial overlap with traditional statistics, but often uses its own jargon about a variety of topics ("identification," "exogenous," etc.). I once heard an applied statistics professor in another field comment that frequently…
Ari B. Friedman
- 3,591
84
votes
9 answers
Probability of a single real-life future event: What does it mean when they say that "Hillary has a 75% chance of winning"?
As the election is a one time event, it is not an experiment that can be repeated. So exactly what does the statement "Hillary has a 75% chance of winning" technically mean? I am seeking a statistically correct definition not an intuitive or…
pitosalas
- 963
84
votes
6 answers
Does no correlation imply no causality?
I know that correlation does not imply causality but does an absence of correlation imply absence of causality?
user2088176
- 945
- 1
- 6
- 9
84
votes
6 answers
What is an intuitive explanation for how PCA turns from a geometric problem (with distances) to a linear algebra problem (with eigenvectors)?
I've read a lot about PCA, including various tutorials and questions (such as this one, this one, this one, and this one).
The geometric problem that PCA is trying to optimize is clear to me: PCA tries to find the first principal component by…
stackoverflowuser2010
- 3,550
83
votes
2 answers
XKCD's modified Bayes theorem: actually kinda reasonable?
I know this is from a comic famous for taking advantage of certain analytical tendencies, but it actually looks kind of reasonable after a few minutes of staring. Can anyone outline for me what this "modified Bayes theorem" is doing?
eric_kernfeld
- 5,209
83
votes
6 answers
Choosing a clustering method
When using cluster analysis on a data set to group similar cases, one needs to choose among a large number of clustering methods and measures of distance. Sometimes, one choice might influence the other, but there are many possible combinations of…
Brett
- 6,194
- 3
- 33
- 41
83
votes
28 answers
Examples for teaching: Correlation does not mean causation
There is an old saying: "Correlation does not mean causation". When I teach, I tend to use the following standard examples to illustrate this point:
number of storks and birth rate in Denmark;
number of priests in America and alcoholism;
in the…
csgillespie
- 13,029
83
votes
3 answers
Why do neural network researchers care about epochs?
An epoch in stochastic gradient descent is defined as a single pass through the data. For each SGD minibatch, $k$ samples are drawn, the gradient computed and parameters are updated. In the epoch setting, the samples are drawn without…
Sycorax
- 90,934