Highest Voted Questions - Statistical Analysis Stack Exchange

42

votes

1 answer

How to determine significant principal components using bootstrapping or Monte Carlo approach?

I am interested in determining the number of significant patterns coming out of a Principal Component Analysis (PCA) or Empirical Orthogonal Function (EOF) Analysis. I am particularly interested in applying this method to climate data. The data…

asked Aug 08 '12 at 12:03

Marc in the box

3,712

42

votes

3 answers

Difference between generalized linear models & generalized linear mixed models

I am wondering what the differences are between mixed and unmixed GLMs. For instance, in SPSS the drop down menu allows users to fit either: analyze-> generalized linear models-> generalized linear models & analyze-> mixed models-> generalized…

asked Jul 16 '12 at 23:47

user9203

689

42

votes

3 answers

How to interpret OOB and confusion matrix for random forest?

I got a an R script from someone to run a random forest model. I modified and run it with some employee data. We are trying to predict voluntary separations. Here is some additional info: this is a classification model were 0 = employee stayed, 1=…

asked Jun 18 '12 at 17:43

daniellopez46

945

42

votes

4 answers

What is the difference between a stationary test and a unit root test?

What is the difference between the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test and the augmented Dickey-Fuller (ADF) test? Are they testing the same thing? Or do we need to use them in different situations?

asked Jun 16 '12 at 07:53

Flying pig

6,239

42

votes

3 answers

Interpreting residual diagnostic plots for glm models?

I am looking for guidelines on how to interpret residual plots of glm models. Especially poisson, negative binomial, binomial models. What can we expect from these plots when the models are "correct"? (for example, we expect the variance to grow…

asked May 27 '12 at 21:25

Tal Galili

21,541

42

votes

5 answers

Good games for learning statistical thinking?

Are there any games that get the player "think like a statistician"? For example, lightbot gets you to "think like a programmer" (in a very basic way). Are there any games - designed for entertainment or teaching - that can help get one comfortable…

asked May 22 '12 at 12:04

Emile

1,097

42

votes

4 answers

Why is max pooling necessary in convolutional neural networks?

Most common convolutional neural networks contains pooling layers to reduce the dimensions of output features. Why couldn't I achieve the same thing by simply increase the stride of the convolutional layer? What makes the pooling layer necessary?

asked Jul 01 '17 at 01:35

user3667089

533
1
4
6

42

votes

5 answers

Yolo Loss function explanation

I am trying to understand the Yolo v2 loss function: \begin{align} &\lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{obj}[(x_i-\hat{x}_i)^2 + (y_i-\hat{y}_i)^2 ] \\&+ \lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B…

asked Jun 27 '17 at 01:56

Kamel BOUYACOUB

451
1
5
3

42

votes

4 answers

Maximum Mean Discrepancy (distance distribution)

I have two data sets (source and target data) which follow different distributions. I am using MMD - that is a non-parametric distribution distance - to compute marginal distribution between the source and target data. source data, Xs target data,…

asked Apr 28 '17 at 15:45

Mahsa

521

42

votes

3 answers

How to take derivative of multivariate normal density?

Say I have multivariate normal $N(\mu, \Sigma)$ density. I want to get the second (partial) derivative w.r.t. $\mu$. Not sure how to take derivative of a matrix. Wiki says take the derivative element by element inside the matrix. I am working…

asked May 01 '12 at 03:24

user1061210

1,085

42

votes

3 answers

Why do naive Bayesian classifiers perform so well?

Naive Bayes classifiers are a popular choice for classification problems. There are many reasons for this, including: "Zeitgeist" - widespread awareness after the success of spam filters about ten years ago Easy to write The classifier model is…

asked Feb 08 '12 at 20:39

winwaed

1,133

42

votes

2 answers

Dropping one of the columns when using one-hot encoding

My understanding is that in machine learning it can be a problem if your dataset has highly correlated features, as they effectively encode the same information. Recently someone pointed out that when you do one-hot encoding on a categorical…

asked Aug 23 '16 at 13:51

dasboth

728

42

votes

3 answers

Classification/evaluation metrics for highly imbalanced data

I deal with a fraud detection (credit-scoring-like) problem. As such there is a highly imbalanced relation between fraudulent and non-fraudulent observations. http://blog.revolutionanalytics.com/2016/03/com_class_eval_metrics_r.html provides a great…

asked Jul 07 '16 at 08:42

Georg Heiler

595

42

votes

5 answers

Confidence interval for median

I have to find a 95% C.I. on the median and other percentiles. I don't know how to approach this. I mainly use R as a programming tool.

asked Jan 15 '12 at 03:12

Dominic Comtois

2,129

42

votes

14 answers

Regression to the mean vs gambler's fallacy

On the one hand, I have the regression to the mean and on the other hand I have the gambler´s fallacy. Gambler’s fallacy is defined by Miller and Sanjurjo (2019) as “the mistaken belief that random sequences have a systematic tendency towards…

asked Mar 29 '16 at 17:44

Luis P.

761

Most Popular