Most Popular

1500 questions
68
votes
1 answer

Why is the square root transformation recommended for count data?

It is often recommended to take the square root when you have count data. (For some examples on CV, see @HarveyMotulsky's answer here, or @whuber's answer here.) On the other hand, when fitting a generalized linear model with a response variable…
68
votes
4 answers

Why is expectation the same as the arithmetic mean?

Today I came across a new topic called the Mathematical Expectation. The book I am following says, expectation is the arithmetic mean of random variable coming from any probability distribution. But, it defines expectation as the sum of product of…
pranphy
  • 961
68
votes
6 answers

What is the difference between estimation and prediction?

For example, I have historical loss data and I am calculating extreme quantiles (Value-at-Risk or Probable Maximum Loss). The results obtained is for estimating the loss or predicting them? Where can one draw the line? I am confused.
melon
  • 681
68
votes
4 answers

Won't highly-correlated variables in random forest distort accuracy and feature-selection?

In my understanding, highly correlated variables won't cause multi-collinearity issues in random forest model (Please correct me if I'm wrong). However, on the other way, if I have too many variables containing similar information, will the model…
Yoki
  • 929
68
votes
6 answers

Is the "hybrid" between Fisher and Neyman-Pearson approaches to statistical testing really an "incoherent mishmash"?

There exists a certain school of thought according to which the most widespread approach to statistical testing is a "hybrid" between two approaches: that of Fisher and that of Neyman-Pearson; these two approaches, the claim goes, are "incompatible"…
amoeba
  • 104,745
67
votes
4 answers

Comparing SVM and logistic regression

Can someone please give me some intuition as to when to choose either SVM or LR? I want to understand the intuition behind what is the difference between the optimization criteria of learning the hyperplane of the two, where the respective aims are…
user41799
  • 721
  • 1
  • 6
  • 5
67
votes
2 answers

Why only three partitions? (training, validation, test)

When you are trying to fit models to a large dataset, the common advice is to partition the data into three parts: the training, validation, and test dataset. This is because the models usually have three "levels" of parameters: the first…
67
votes
6 answers

Efficient online linear regression

I'm analysing some data where I would like to perform ordinary linear regression, however this is not possible as I am dealing with an on-line setting with a continuous stream of input data (which will quickly get too large for memory) and need to…
mikera
  • 1,005
67
votes
14 answers

What is the most surprising characterization of the Gaussian (normal) distribution?

A standardized Gaussian distribution on $\mathbb{R}$ can be defined by giving explicitly its density: $$ \frac{1}{\sqrt{2\pi}}e^{-x^2/2}$$ or its characteristic function. As recalled in this question it is also the only distribution for which the…
robin girard
  • 6,705
67
votes
5 answers

How does one interpret SVM feature weights?

I am trying to interpret the variable weights given by fitting a linear SVM. (I'm using scikit-learn): from sklearn import svm svm = svm.SVC(kernel='linear') svm.fit(features, labels) svm.coef_ I cannot find anything in the documentation that…
67
votes
5 answers

What should I do when my neural network doesn't generalize well?

I'm training a neural network and the training loss decreases, but the validation loss doesn't, or it decreases much less than what I would expect, based on references or experiments with very similar architectures and data. How can I fix this? As…
DeltaIV
  • 17,954
67
votes
4 answers

Regression for an outcome (ratio or fraction) between 0 and 1

I am thinking of building a model predicting a ratio $a/b$, where $a \le b$ and $a > 0$ and $b > 0$. So, the ratio would be between $0$ and $1$. I could use linear regression, although it doesn't naturally limit to 0..1. I have no reason to believe…
dfrankow
  • 3,376
67
votes
10 answers

What is the difference between prediction and inference?

I'm reading through "An Introduction to Statistical Learning" . In chapter 2, they discuss the reason for estimating a function $f$. 2.1.1 Why Estimate $f$? There are two main reasons we may wish to estimate f : prediction and inference. We discuss…
67
votes
2 answers

Are mean normalization and feature scaling needed for k-means clustering?

What are the best (recommended) pre-processing steps before performing k-means?
pedrosaurio
  • 1,353
67
votes
7 answers

Where did the frequentist-Bayesian debate go?

The world of statistics was divided between frequentists and Bayesians. These days it seems everyone does a bit of both. How can this be? If the different approaches are suitable for different problems, why did the founding fathers of statistics did…
JohnRos
  • 5,684