Highest Voted Questions - Statistical Analysis Stack Exchange

61

votes

5 answers

Neural networks vs support vector machines: are the second definitely superior?

Many authors of papers I read affirm SVMs is superior technique to face their regression/classification problem, aware that they couldn't get similar results through NNs. Often the comparison states that SVMs, instead of NNs, Have a strong founding…

asked Jun 08 '12 at 02:59

stackovergio

1,055

61

votes

6 answers

Which loss function is correct for logistic regression?

I read about two versions of the loss function for logistic regression, which of them is correct and why? From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$: $$l(\beta) =…

asked Dec 11 '16 at 17:05

xtt

744

61

votes

10 answers

Who are frequentists?

We already had a thread asking who are Bayesians and one asking if frequentists are Bayesians, but there was no thread asking directly who are frequentists? This is a question that was asked by @whuber as a comment to this thread and it begs to be…

asked Aug 29 '16 at 18:48

Tim

138,066

61

votes

5 answers

Apply word embeddings to entire document, to get a feature vector

How do I use a word embedding to map a document to a feature vector, suitable for use with supervised learning? A word embedding maps each word $w$ to a vector $v \in \mathbb{R}^d$, where $d$ is some not-too-large number (e.g., 500). Popular word…

asked Jul 01 '16 at 17:16

D.W.

6,668

61

votes

12 answers

Reference book for linear algebra applied to statistics?

I have been working in R for a bit and have been faced with things like PCA, SVD, QR decompositions and many such linear algebra results (when inspecting estimating weighted regressions and such) so I wanted to know if anyone has a recommendation on…

asked Jan 19 '12 at 17:32

Palace Chan

1,003

61

votes

3 answers

How to select a clustering method? How to validate a cluster solution (to warrant the method choice)?

One of the biggest issue with cluster analysis is that we may happen to have to derive different conclusion when base on different clustering methods used (including different linkage methods in hierarchical clustering). I would like to know your…

asked Feb 13 '16 at 23:19

Learner

929

61

votes

3 answers

How does saddlepoint approximation work?

How does saddlepoint approximation work? What sort of problem is it good for? (Feel free to use a particular example or examples by way of illustration) Are there any drawbacks, difficulties, things to watch out for, or traps for the unwary?

asked Jan 20 '16 at 01:35

Glen_b

282,281

61

votes

2 answers

A/B tests: z-test vs t-test vs chi square vs fisher exact test

I'm trying to understand the reasoning by choosing a specific test approach when dealing with a simple A/B test - (i.e. two variations/groups with a binary respone (converted or not). As an example I will be using the data below Version Visits …

asked Oct 27 '15 at 12:44

L Xandor

1,229
2
12
16

61

votes

1 answer

How to apply standardization/normalization to train- and testset if prediction is the goal?

Do I transform all my data or folds (if CV is applied) at the same time? e.g. (allData - mean(allData)) / sd(allData) Do I transform trainset and testset separately? e.g. (trainData - mean(trainData)) / sd(trainData) (testData - mean(testData)) /…

asked Sep 30 '15 at 12:39

DerTom

807

61

votes

13 answers

Does 10 heads in a row increase the chance of the next toss being a tail?

I assume the following is true: assuming a fair coin, getting 10 heads in a row whilst tossing a coin does not increase the chance of the next coin toss being a tail, no matter what amount of probability and/or statistical jargon is tossed around…

asked Feb 09 '15 at 08:15

user68492

611

60

votes

4 answers

Kullback–Leibler vs Kolmogorov-Smirnov distance

I can see that there are a lot of formal differences between Kullback–Leibler vs Kolmogorov-Smirnov distance measures. However, both are used to measure the distance between distributions. Is there a typical situation where one should be used…

asked Apr 07 '11 at 11:39

Greg

703

60

votes

4 answers

Why do statisticians say a non-significant result means "you can't reject the null" as opposed to accepting the null hypothesis?

Traditional statistical tests, like the two sample t-test, focus on trying to eliminate the hypothesis that there is no difference between a function of two independent samples. Then, we choose a confidence level and say that if the difference of…

asked Feb 08 '14 at 20:55

ryu576

2,540

60

votes

6 answers

Relationship between $R^2$ and correlation coefficient

Let's say I have two 1-dimensional arrays, $a_1$ and $a_2$. Each contains 100 data points. $a_1$ is the actual data, and $a_2$ is the model prediction. In this case, the $R^2$ value would be: $$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}}…

asked Jan 25 '14 at 21:01

Shawn Wang

1,385

60

votes

8 answers

Examples where method of moments can beat maximum likelihood in small samples?

Maximum likelihood estimators (MLE) are asymptotically efficient; we see the practical upshot in that they often do better than method of moments (MoM) estimates (when they differ), even at small sample sizes Here 'better than' means in the sense…

asked Dec 22 '13 at 23:30

Glen_b

282,281

60

votes

4 answers

Is there a test to determine whether GLM overdispersion is significant?

I'm creating Poisson GLMs in R. To check for overdispersion I'm looking at the ratio of residual deviance to degrees of freedom provided by summary(model.name). Is there a cutoff value or test for this ratio to be considered "significant?" I know…

asked Aug 05 '13 at 14:12

kto

725

Most Popular