Highest Voted Questions - Statistical Analysis Stack Exchange

55

votes

3 answers

Help me understand the quantile (inverse CDF) function

I am reading about the quantile function, but it is not clear to me. Could you provide a more intuitive explanation than the one provided below? Since the cdf $F$ is a monotonically increasing function, it has an inverse; let us denote this by…

asked May 16 '16 at 12:03

Inder Gill

693

55

votes

4 answers

How to interpret Mean Decrease in Accuracy and Mean Decrease GINI in Random Forest models

I'm having some difficulty understanding how to interpret variable importance output from the Random Forest package. Mean decrease in accuracy is usually described as "the decrease in model accuracy from permuting the values in each feature". Is…

asked Feb 22 '16 at 00:19

FlacoT

842

55

votes

3 answers

What kind of information is Fisher information?

Suppose we have a random variable $X \sim f(x|\theta)$. If $\theta_0$ were the true parameter, the the likelihood function should be maximized and the derivative equal to zero. This is the basic principle behind the maximum likelihood estimator. As…

asked Feb 14 '16 at 21:42

Stan Shunpike

4,333

55

votes

8 answers

Book for reading before Elements of Statistical Learning?

Based on this post, I want to digest Elements of Statistical Learning. Fortunately it is available for free and I started reading it. I don't have enough knowledge to understand it. Can you recommend a book that is a better introduction to the…

asked Nov 26 '11 at 03:12

B Seven

2,913

55

votes

3 answers

Gradient Boosting for Linear Regression - why does it not work?

While learning about Gradient Boosting, I haven't heard about any constraints regarding the properties of a "weak classifier" that the method uses to build and ensemble model. However, I could not imagine an application of a GB that uses linear…

asked Dec 16 '15 at 00:41

Matek

951

55

votes

3 answers

What does the term saturating nonlinearities mean?

I was reading the paper ImageNet Classification with Deep Convolutional Neural Networks and in section 3 were they explain the architecture of their Convolutional Neural Network they explain how they preferred using: non-saturating nonlinearity…

asked Sep 26 '15 at 19:45

Charlie Parker

6,866

55

votes

5 answers

Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?

When we conduct linear regression $y=ax+b$ to fit a bunch of data points $(x_1,y_1),(x_2,y_2),...,(x_n,y_n)$, the classic approach minimizes the squared error. I have long been puzzled by a question that will minimizing the squared error yield the…

asked Apr 18 '15 at 02:17

Tony

1,803

55

votes

1 answer

How large should the batch size be for stochastic gradient descent?

I understand that stochastic gradient descent may be used to optimize a neural network using backpropagation by updating each iteration with a different sample of the training dataset. How large should the batch size be?

asked Mar 07 '15 at 21:18

Simon Kuang

2,111

55

votes

5 answers

What is the difference between errors and residuals?

While these two ubiquitous terms are often used synonymously, there sometimes seems to be a distinction. Is there indeed a difference, or are they exactly synonymous?

asked Jan 14 '15 at 15:27

Constantin

1,367
1
12
27

55

votes

5 answers

Bayesian equivalent of two sample t-test?

I'm not looking for a plug and play method like BEST in R but rather a mathematical explanation of what are some Bayesian methods I can use to test the difference between the mean of two samples.

asked Dec 26 '14 at 19:06

John

581
1
5
3

55

votes

6 answers

How to determine the optimal threshold for a classifier and generate ROC curve?

Let say we have a SVM classifier, how do we generate ROC curve? (Like theoretically) (because we are generate TPR and FPR with each of the threshold). And how do we determine the optimal threshold for this SVM classifier?

asked Nov 07 '14 at 19:20

RockTheStar

12,907
34
71
96

55

votes

6 answers

Why downsample?

Suppose I want to learn a classifier that predicts if an email is spam. And suppose only 1% of emails are spam. The easiest thing to do would be to learn the trivial classifier that says none of the emails are spam. This classifier would give us…

asked Nov 02 '14 at 19:25

Jessica

2,091

55

votes

3 answers

Why does frequentist hypothesis testing become biased towards rejecting the null hypothesis with sufficiently large samples?

I was just reading this article on the Bayes factor for a completely unrelated problem when I stumbled upon this passage Hypothesis testing with Bayes factors is more robust than frequentist hypothesis testing, since the Bayesian form avoids model…

asked Jul 22 '14 at 20:06

Louis Thibault

693

54

votes

3 answers

What is the distribution of the Euclidean distance between two normally distributed random variables?

Assume you are given two objects whose exact locations are unknown, but are distributed according to normal distributions with known parameters (e.g. $a \sim N(m, s)$ and $b \sim N(v, t))$. We can assume these are both bivariate normals, such that…

asked Apr 05 '11 at 19:10

Nick

3,537

54

votes

2 answers

PP-plots vs. QQ-plots

What is the difference between probability plots, PP-plots and QQ-plots when trying to analyse a fitted distribution to data?

asked Apr 01 '14 at 14:23

kay

671

Most Popular