Highest Voted Questions - Statistical Analysis Stack Exchange

44

votes

2 answers

Is cosine similarity identical to l2-normalized euclidean distance?

Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V. I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique…

asked Apr 13 '15 at 22:58

Arne

543

44

votes

2 answers

Measures of variable importance in random forests

I've been playing around with random forests for regression and am having difficulty working out exactly what the two measures of importance mean, and how they should be interpreted. The importance() function gives two values for each variable:…

asked Jul 04 '11 at 08:25

dcl

2,762

44

votes

2 answers

Interpretation of plot (glm.model)

Can anyone tell me how to interpret the 'residuals vs fitted', 'normal q-q', 'scale-location', and 'residuals vs leverage' plots? I am fitting a binomial GLM, saving it and then plotting it.

asked Oct 26 '14 at 17:38

Summer

441

44

votes

2 answers

Why should we use t errors instead of normal errors?

In this blog post by Andrew Gelman, there is the following passage: The Bayesian models of 50 years ago seem hopelessly simple (except, of course, for simple problems), and I expect the Bayesian models of today will seem hopelessly simple, 50…

asked Oct 20 '14 at 16:15

Potato

1,085

44

votes

2 answers

Understanding shape and calculation of confidence bands in linear regression

I am trying to understand the origin of the curved shaped of confidence bands associated with an OLS linear regression and how it relates to the confidence intervals of the regression parameters (slope and intercept), for example (using…

asked Jun 05 '14 at 16:18

David

441

44

votes

4 answers

Standard error clustering in R (either manually or in plm)

I am trying to understand standard error "clustering" and how to execute in R (it is trivial in Stata). In R I have been unsuccessful using either plm or writing my own function. I'll use the diamonds data from the ggplot2 package. I can do fixed…

asked Apr 27 '11 at 02:34

Richard Herron

1,261

43

votes

3 answers

Guideline to select the hyperparameters in Deep Learning

I'm looking for a paper that could help in giving a guideline on how to choose the hyperparameters of a deep architecture, like stacked auto-encoders or deep believe networks. There are a lot of hyperparameters and I'm very confused on how to choose…

asked Apr 28 '14 at 12:48

Jack Twain

8,381

43

votes

3 answers

Random number-Set.seed(N) in R

I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results. But what I don't get is what do the values themselves mean. I am playing…

asked Feb 12 '14 at 02:09

mylesg

623

43

votes

3 answers

What is the meaning of a confidence interval taken from bootstrapped resamples?

I've been looking at numerous questions on this site regarding bootstrapping and confidence intervals, but I'm still confused. Part of the reason for my confusion is probably that I'm not advanced enough in my statistics knowledge to understand a…

asked Jan 29 '14 at 16:13

iarwain

433

43

votes

4 answers

What is the difference between McNemar's test and the chi-squared test, and how do you know when to use each?

I have tried reading up on different sources, but I am still not clear what test would be the appropriate in my case. There are three different questions I am asking about my dataset: The subjects are tested for infections from X at different…

asked Nov 18 '13 at 13:37

Anto

763
1
8
13

43

votes

1 answer

What does the anova() command do with a lmer model object?

Hopefully this is a question that someone here can answer for me on the nature of decomposing sums of squares from a mixed-effects model fit with lmer (from the lme4 R package). First off I should say that I am aware of the controversy with using…

asked Oct 04 '13 at 14:05

Martyn

576

43

votes

3 answers

Which variance inflation factor should I be using: $\text{GVIF}$ or $\text{GVIF}^{1/(2\cdot\text{df})}$?

I'm trying to interpret variance inflation factors using the vif function in the R package car. The function prints both a generalised $\text{VIF}$ and also $\text{GVIF}^{1/(2\cdot\text{df})}$. According to the help file, this latter value To…

asked Sep 22 '13 at 04:57

jay

1,205

43

votes

3 answers

Difference between a SVM and a perceptron

I am a bit confused with the difference between an SVM and a perceptron. Let me try to summarize my understanding here, and please feel free to correct where I am wrong and fill in what I have missed. The Perceptron does not try to optimize the…

asked Jun 07 '13 at 19:15

CuriousMind

2,253

43

votes

9 answers

How can I efficiently model the sum of Bernoulli random variables?

I am modeling a random variable ($Y$) which is the sum of some ~15-40k independent Bernoulli random variables ($X_i$), each with a different success probability ($p_i$). Formally, $Y=\sum X_i$ where $\Pr(X_i=1)=p_i$ and $\Pr(X_i=0)=1-p_i$. I am…

asked Dec 10 '10 at 11:06

David B

1,321
3
13
15

43

votes

4 answers

Estimate quantile of value in a vector

I have a set of real numbers. I need to estimate the quantile of a new number. Is there any clean way to do this in R? in general? I hope this is not ultra-trivial ;-) Much appreciated for your response. PK

asked Feb 15 '13 at 18:06

polarise

563
1
4
7

Most Popular