Most Popular

1500 questions
61
votes
5 answers

Neural networks vs support vector machines: are the second definitely superior?

Many authors of papers I read affirm SVMs is superior technique to face their regression/classification problem, aware that they couldn't get similar results through NNs. Often the comparison states that SVMs, instead of NNs, Have a strong founding…
stackovergio
  • 1,055
61
votes
6 answers

Which loss function is correct for logistic regression?

I read about two versions of the loss function for logistic regression, which of them is correct and why? From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$: $$l(\beta) =…
xtt
  • 744
61
votes
10 answers

Who are frequentists?

We already had a thread asking who are Bayesians and one asking if frequentists are Bayesians, but there was no thread asking directly who are frequentists? This is a question that was asked by @whuber as a comment to this thread and it begs to be…
Tim
  • 138,066
61
votes
5 answers

Apply word embeddings to entire document, to get a feature vector

How do I use a word embedding to map a document to a feature vector, suitable for use with supervised learning? A word embedding maps each word $w$ to a vector $v \in \mathbb{R}^d$, where $d$ is some not-too-large number (e.g., 500). Popular word…
D.W.
  • 6,668
61
votes
12 answers

Reference book for linear algebra applied to statistics?

I have been working in R for a bit and have been faced with things like PCA, SVD, QR decompositions and many such linear algebra results (when inspecting estimating weighted regressions and such) so I wanted to know if anyone has a recommendation on…
Palace Chan
  • 1,003
61
votes
3 answers

How to select a clustering method? How to validate a cluster solution (to warrant the method choice)?

One of the biggest issue with cluster analysis is that we may happen to have to derive different conclusion when base on different clustering methods used (including different linkage methods in hierarchical clustering). I would like to know your…
Learner
  • 929
61
votes
3 answers

How does saddlepoint approximation work?

How does saddlepoint approximation work? What sort of problem is it good for? (Feel free to use a particular example or examples by way of illustration) Are there any drawbacks, difficulties, things to watch out for, or traps for the unwary?
Glen_b
  • 282,281
61
votes
2 answers

A/B tests: z-test vs t-test vs chi square vs fisher exact test

I'm trying to understand the reasoning by choosing a specific test approach when dealing with a simple A/B test - (i.e. two variations/groups with a binary respone (converted or not). As an example I will be using the data below Version Visits …
L Xandor
  • 1,229
  • 2
  • 12
  • 16
61
votes
1 answer

How to apply standardization/normalization to train- and testset if prediction is the goal?

Do I transform all my data or folds (if CV is applied) at the same time? e.g. (allData - mean(allData)) / sd(allData) Do I transform trainset and testset separately? e.g. (trainData - mean(trainData)) / sd(trainData) (testData - mean(testData)) /…
DerTom
  • 807
61
votes
13 answers

Does 10 heads in a row increase the chance of the next toss being a tail?

I assume the following is true: assuming a fair coin, getting 10 heads in a row whilst tossing a coin does not increase the chance of the next coin toss being a tail, no matter what amount of probability and/or statistical jargon is tossed around…
user68492
  • 611
60
votes
4 answers

Kullback–Leibler vs Kolmogorov-Smirnov distance

I can see that there are a lot of formal differences between Kullback–Leibler vs Kolmogorov-Smirnov distance measures. However, both are used to measure the distance between distributions. Is there a typical situation where one should be used…
Greg
  • 703
60
votes
4 answers

Why do statisticians say a non-significant result means "you can't reject the null" as opposed to accepting the null hypothesis?

Traditional statistical tests, like the two sample t-test, focus on trying to eliminate the hypothesis that there is no difference between a function of two independent samples. Then, we choose a confidence level and say that if the difference of…
ryu576
  • 2,540
60
votes
6 answers

Relationship between $R^2$ and correlation coefficient

Let's say I have two 1-dimensional arrays, $a_1$ and $a_2$. Each contains 100 data points. $a_1$ is the actual data, and $a_2$ is the model prediction. In this case, the $R^2$ value would be: $$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}}…
Shawn Wang
  • 1,385
60
votes
8 answers

Examples where method of moments can beat maximum likelihood in small samples?

Maximum likelihood estimators (MLE) are asymptotically efficient; we see the practical upshot in that they often do better than method of moments (MoM) estimates (when they differ), even at small sample sizes Here 'better than' means in the sense…
Glen_b
  • 282,281
60
votes
4 answers

Is there a test to determine whether GLM overdispersion is significant?

I'm creating Poisson GLMs in R. To check for overdispersion I'm looking at the ratio of residual deviance to degrees of freedom provided by summary(model.name). Is there a cutoff value or test for this ratio to be considered "significant?" I know…
kto
  • 725