Highest Voted Questions - Statistical Analysis Stack Exchange

115

votes

21 answers

What's a real-world example of "overfitting"?

I kind of understand what "overfitting" means, but I need help as to how to come up with a real-world example that applies to overfitting.

overfitting

asked Dec 11 '14 at 06:28

user3851283

307

115

votes

11 answers

"Best" series of colors to use for differentiating series in publication-quality plots

Has any study been done on what are the best set of colors to use for showing multiple series on the same plot? I've just been using the defaults in matplotlib, and they look a little childish since they're all bright, primary colors.

data-visualization

asked Oct 06 '14 at 14:33

Daisy Sophia Hollman

1,313

114

votes

4 answers

How does the correlation coefficient differ from regression slope?

I would have expected the correlation coefficient to be the same as a regression slope (beta), however having just compared the two, they are different. How do they differ - what different information do they give?

asked Jul 17 '12 at 14:43

luciano

14,269

114

votes

2 answers

What is an embedding layer in a neural network?

In many neural network libraries, there are 'embedding layers', like in Keras or Lasagne. I am not sure I understand its function, despite reading the documentation. For example, in the Keras documentation it says: Turn positive integers (indexes)…

asked Nov 20 '15 at 16:43

Francesco

1,243

114

votes

7 answers

What is the difference between a multiclass and a multilabel problem?

What is the difference between a multiclass problem and a multilabel problem?

asked Jun 13 '11 at 05:35

Learner

4,457

113

votes

4 answers

What is the difference between Cross-entropy and KL divergence?

Both the cross-entropy and the KL divergence are tools to measure the distance between two probability distributions, but what is the difference between them? $$ H(P,Q) = -\sum_x P(x)\log Q(x) $$ $$ KL(P | Q) = \sum_{x} P(x)\log {\frac{P(x)}{Q(x)}}…

asked Jul 19 '18 at 13:02

maso

1,359

111

votes

5 answers

Diagnostic plots for count regression

What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable? I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…

asked Sep 20 '13 at 01:17

half-pass

3,740

111

votes

3 answers

Feature selection and cross-validation

I have recently been reading a lot on this site (@Aniko, @Dikran Marsupial, @Erik) and elsewhere about the problem of overfitting occuring with cross validation - (Smialowski et al 2010 Bioinformatics, Hastie, Elements of statistical learning). The…

asked May 04 '12 at 10:09

BGreene

3,283

111

votes

6 answers

On the importance of the i.i.d. assumption in statistical learning

In statistical learning, implicitly or explicitly, one always assumes that the training set $\mathcal{D} = \{ \bf {X}, \bf{y} \}$ is composed of $N$ input/response tuples $({\bf{X}}_i,y_i)$ that are independently drawn from the same joint…

asked May 19 '16 at 13:28

Quantuple

1,546

111

votes

4 answers

Relationship between poisson and exponential distribution

The waiting times for poisson distribution is an exponential distribution with parameter lambda. But I don't understand it. Poisson models the number of arrivals per unit of time for example. How is this related to exponential distribution? Lets say…

asked Aug 25 '10 at 08:33

user862

2,749

110

votes

3 answers

Subscript notation in expectations

What is the exact meaning of the subscript notation $\mathbb{E}_X[f(X)]$ in conditional expectations in the framework of measure theory ? These subscripts do not appear in the definition of conditional expectation, but we may see for example in this…

asked Oct 12 '13 at 11:04

Emile

3,460

110

votes

2 answers

What is covariance in plain language?

What is covariance in plain language and how is it linked to the terms dependence, correlation and variance-covariance structure with respect to repeated-measures designs?

asked Jun 03 '12 at 05:01

abc

1,811

110

votes

11 answers

What is the best way to remember the difference between sensitivity, specificity, precision, accuracy, and recall?

Despite having seen these terms 502847894789 times, I cannot for the life of me remember the difference between sensitivity, specificity, precision, accuracy, and recall. They're pretty simple concepts, but the names are highly unintuitive to me,…

asked Oct 31 '14 at 19:14

Jessica

2,091

109

votes

7 answers

Detecting a given face in a database of facial images

I'm working on a little project involving the faces of twitter users via their profile pictures. A problem I've encountered is that after I filter out all but the images that are clear portrait photos, a small but significant percentage of twitter…

asked Feb 14 '11 at 22:41

ʞɔıu

1,117

109

votes

10 answers

What is meant by a "random variable"?

What do they mean when they say "random variable"?

asked Jul 19 '10 at 19:37

Baltimark

2,268

Most Popular