Most Popular

1500 questions
85
votes
2 answers

What is global max pooling layer and what is its advantage over maxpooling layer?

Can somebody explain what is a global max pooling layer and why and when do we use it for training a neural network. Do they have any advantage over ordinary max pooling layer?
Eka
  • 2,251
85
votes
2 answers

What is a "kernel" in plain English?

There are several distinct usages: kernel density estimation kernel trick kernel smoothing Please explain what the "kernel" in them means, in plain English, in your own words.
85
votes
6 answers

Why is it that natural log changes are percentage changes? What is about logs that makes this so?

Can somebody explain how the properties of logs make it so you can do log linear regressions where the coefficients are interpreted as percentage changes?
thewhitetie
  • 1,057
  • 1
  • 8
  • 7
85
votes
6 answers

Variable selection for predictive modeling really needed in 2016?

This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [3]. First, some context. Let's assume the goal…
horaceT
  • 3,352
85
votes
14 answers

When (if ever) is a frequentist approach substantively better than a Bayesian?

Background: I do not have an formal training in Bayesian statistics (though I am very interested in learning more), but I know enough--I think--to get the gist of why many feel as though they are preferable to Frequentist statistics. Even the…
jsakaluk
  • 5,514
  • 1
  • 23
  • 47
84
votes
5 answers

What are good RMSE values?

Suppose I have some dataset. I perform some regression on it. I have a separate test dataset. I test the regression on this set. Find the RMSE on the test data. How should I conclude that my learning algorithm has done well, I mean what properties…
84
votes
1 answer

Help me understand Support Vector Machines

I understand the basics of what a Support Vector Machines' aim is in terms of classifying an input set into several different classes, but what I don't understand is some of the nitty-gritty details. For starters, I'm a bit confused by the use of…
rohanbk
  • 1,257
84
votes
10 answers

What are the major philosophical, methodological, and terminological differences between econometrics and other statistical fields?

Econometrics has substantial overlap with traditional statistics, but often uses its own jargon about a variety of topics ("identification," "exogenous," etc.). I once heard an applied statistics professor in another field comment that frequently…
84
votes
9 answers

Probability of a single real-life future event: What does it mean when they say that "Hillary has a 75% chance of winning"?

As the election is a one time event, it is not an experiment that can be repeated. So exactly what does the statement "Hillary has a 75% chance of winning" technically mean? I am seeking a statistically correct definition not an intuitive or…
pitosalas
  • 963
84
votes
6 answers

Does no correlation imply no causality?

I know that correlation does not imply causality but does an absence of correlation imply absence of causality?
user2088176
  • 945
  • 1
  • 6
  • 9
84
votes
6 answers

What is an intuitive explanation for how PCA turns from a geometric problem (with distances) to a linear algebra problem (with eigenvectors)?

I've read a lot about PCA, including various tutorials and questions (such as this one, this one, this one, and this one). The geometric problem that PCA is trying to optimize is clear to me: PCA tries to find the first principal component by…
83
votes
2 answers

XKCD's modified Bayes theorem: actually kinda reasonable?

I know this is from a comic famous for taking advantage of certain analytical tendencies, but it actually looks kind of reasonable after a few minutes of staring. Can anyone outline for me what this "modified Bayes theorem" is doing?
83
votes
6 answers

Choosing a clustering method

When using cluster analysis on a data set to group similar cases, one needs to choose among a large number of clustering methods and measures of distance. Sometimes, one choice might influence the other, but there are many possible combinations of…
Brett
  • 6,194
  • 3
  • 33
  • 41
83
votes
28 answers

Examples for teaching: Correlation does not mean causation

There is an old saying: "Correlation does not mean causation". When I teach, I tend to use the following standard examples to illustrate this point: number of storks and birth rate in Denmark; number of priests in America and alcoholism; in the…
csgillespie
  • 13,029
83
votes
3 answers

Why do neural network researchers care about epochs?

An epoch in stochastic gradient descent is defined as a single pass through the data. For each SGD minibatch, $k$ samples are drawn, the gradient computed and parameters are updated. In the epoch setting, the samples are drawn without…
Sycorax
  • 90,934