Most Popular
1500 questions
44
votes
2 answers
Is cosine similarity identical to l2-normalized euclidean distance?
Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V.
I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique…
Arne
- 543
44
votes
2 answers
Measures of variable importance in random forests
I've been playing around with random forests for regression and am having difficulty working out exactly what the two measures of importance mean, and how they should be interpreted.
The importance() function gives two values for each variable:…
dcl
- 2,762
44
votes
2 answers
Interpretation of plot (glm.model)
Can anyone tell me how to interpret the 'residuals vs fitted', 'normal q-q', 'scale-location', and 'residuals vs leverage' plots? I am fitting a binomial GLM, saving it and then plotting it.
Summer
- 441
44
votes
2 answers
Why should we use t errors instead of normal errors?
In this blog post by Andrew Gelman, there is the following passage:
The Bayesian models of 50 years ago seem hopelessly simple (except, of
course, for simple problems), and I expect the Bayesian models of
today will seem hopelessly simple, 50…
Potato
- 1,085
44
votes
2 answers
Understanding shape and calculation of confidence bands in linear regression
I am trying to understand the origin of the curved shaped of confidence bands associated with an OLS linear regression and how it relates to the confidence intervals of the regression parameters (slope and intercept), for example (using…
David
- 441
44
votes
4 answers
Standard error clustering in R (either manually or in plm)
I am trying to understand standard error "clustering" and how to execute in R (it is trivial in Stata). In R I have been unsuccessful using either plm or writing my own function. I'll use the diamonds data from the ggplot2 package.
I can do fixed…
Richard Herron
- 1,261
43
votes
3 answers
Guideline to select the hyperparameters in Deep Learning
I'm looking for a paper that could help in giving a guideline on how to choose the hyperparameters of a deep architecture, like stacked auto-encoders or deep believe networks. There are a lot of hyperparameters and I'm very confused on how to choose…
Jack Twain
- 8,381
43
votes
3 answers
Random number-Set.seed(N) in R
I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results.
But what I don't get is what do the values themselves mean. I am playing…
mylesg
- 623
43
votes
3 answers
What is the meaning of a confidence interval taken from bootstrapped resamples?
I've been looking at numerous questions on this site regarding bootstrapping and confidence intervals, but I'm still confused. Part of the reason for my confusion is probably that I'm not advanced enough in my statistics knowledge to understand a…
iarwain
- 433
43
votes
4 answers
What is the difference between McNemar's test and the chi-squared test, and how do you know when to use each?
I have tried reading up on different sources, but I am still not clear what test would be the appropriate in my case. There are three different questions I am asking about my dataset:
The subjects are tested for infections from X at different…
Anto
- 763
- 1
- 8
- 13
43
votes
1 answer
What does the anova() command do with a lmer model object?
Hopefully this is a question that someone here can answer for me on the nature of decomposing sums of squares from a mixed-effects model fit with lmer (from the lme4 R package).
First off I should say that I am aware of the controversy with using…
Martyn
- 576
43
votes
3 answers
Which variance inflation factor should I be using: $\text{GVIF}$ or $\text{GVIF}^{1/(2\cdot\text{df})}$?
I'm trying to interpret variance inflation factors using the vif function in the R package car. The function prints both a generalised $\text{VIF}$ and also $\text{GVIF}^{1/(2\cdot\text{df})}$. According to the help file, this latter value
To…
jay
- 1,205
43
votes
3 answers
Difference between a SVM and a perceptron
I am a bit confused with the difference between an SVM and a perceptron. Let me try to summarize my understanding here, and please feel free to correct where I am wrong and fill in what I have missed.
The Perceptron does not try to optimize the…
CuriousMind
- 2,253
43
votes
9 answers
How can I efficiently model the sum of Bernoulli random variables?
I am modeling a random variable ($Y$) which is the sum of some ~15-40k independent Bernoulli random variables ($X_i$), each with a different success probability ($p_i$). Formally, $Y=\sum X_i$ where $\Pr(X_i=1)=p_i$ and $\Pr(X_i=0)=1-p_i$.
I am…
David B
- 1,321
- 3
- 13
- 15
43
votes
4 answers
Estimate quantile of value in a vector
I have a set of real numbers. I need to estimate the quantile of a new number. Is there any clean way to do this in R? in general?
I hope this is not ultra-trivial ;-)
Much appreciated for your response.
PK
polarise
- 563
- 1
- 4
- 7