Most Popular
1500 questions
184
votes
7 answers
How to intuitively explain what a kernel is?
Many machine learning classifiers (e.g. support vector machines) allow one to specify a kernel. What would be an intuitive way of explaining what a kernel is?
One aspect I have been thinking of is the distinction between linear and non-linear…
hashkey
- 1,841
183
votes
10 answers
Bottom to top explanation of the Mahalanobis distance?
I'm studying pattern recognition and statistics and almost every book I open on the subject I bump into the concept of Mahalanobis distance. The books give sort of intuitive explanations, but still not good enough ones for me to actually really…
jjepsuomi
- 5,807
183
votes
6 answers
Can a probability distribution value exceeding 1 be OK?
On the Wikipedia page about naive Bayes classifiers, there is this line:
$p(\mathrm{height}|\mathrm{male}) = 1.5789$ (A probability distribution over 1 is OK. It is the area under the bell curve that is equal to 1.)
How can a value $>1$ be OK? I…
babelproofreader
- 4,819
183
votes
2 answers
A list of cost functions used in neural networks, alongside applications
What are common cost functions used in evaluating the performance of neural networks?
Details
(feel free to skip the rest of this question, my intent here is simply to provide clarification on notation that answers may use to help them be more…
Phylliida
- 2,945
182
votes
10 answers
When is it ok to remove the intercept in a linear regression model?
I am running linear regression models and wondering what the conditions are for removing the intercept term.
In comparing results from two different regressions where one has the intercept and the other does not, I notice that the $R^2$ of the…
analyticsPierce
- 2,021
182
votes
4 answers
Choice of K in K-fold cross-validation
I've been using the $K$-fold cross-validation a few times now to evaluate performance of some learning algorithms, but I've always been puzzled as to how I should choose the value of $K$.
I've often seen and used a value of $K = 10$, but this seems…
Charles Menguy
- 2,367
181
votes
16 answers
Are large data sets inappropriate for hypothesis testing?
In a recent article of Amstat News, the authors (Mark van der Laan and Sherri Rose) stated that "We know that for large enough sample sizes, every study—including ones in which the null hypothesis of no effect is true — will declare a statistically…
Carlos Accioly
- 5,025
- 4
- 28
- 29
181
votes
11 answers
What is the difference between off-policy and on-policy learning?
Artificial intelligence website defines off-policy and on-policy learning as follows:
"An off-policy learner learns the value of the optimal policy independently of the agent's actions. Q-learning is an off-policy learner. An on-policy learner…
cgo
- 9,107
180
votes
4 answers
Cohen's kappa in plain English
I am reading a data mining book and it mentioned the Kappa statistic as a means for evaluating the prediction performance of classifiers. However, I just can't understand this. I also checked Wikipedia but it didn't help too:…
Jack Twain
- 8,381
180
votes
7 answers
What's the difference between variance and standard deviation?
I was wondering what the difference between the variance and the standard deviation is.
If you calculate the two values, it is clear that you get the standard deviation out of the variance, but what does that mean in terms of the distribution you…
Le Max
- 3,729
180
votes
2 answers
Deriving the conditional distributions of a multivariate normal distribution
We have a multivariate normal vector ${\boldsymbol Y} \sim \mathcal{N}(\boldsymbol\mu, \Sigma)$. Consider partitioning $\boldsymbol\mu$ and ${\boldsymbol Y}$ into
$$\boldsymbol\mu
=
\begin{bmatrix}
\boldsymbol\mu_1 \\
…
Flying pig
- 6,239
176
votes
11 answers
What is the difference between a neural network and a deep neural network, and why do the deep ones work better?
I haven't seen the question stated precisely in these terms, and this is why I make a new question.
What I am interested in knowing is not the definition of a neural network, but understanding the actual difference with a deep neural network.
For…
Nicolas
- 1,971
175
votes
5 answers
What's the difference between Normalization and Standardization?
At work we were discussing this as my boss has never heard of normalization. In Linear Algebra, Normalization seems to refer to the dividing of a vector by its length. And in statistics, Standardization seems to refer to the subtraction of a mean…
Chris
- 1,759
173
votes
1 answer
How to reverse PCA and reconstruct original variables from several principal components?
Principal component analysis (PCA) can be used for dimensionality reduction. After such dimensionality reduction is performed, how can one approximately reconstruct the original variables/features from a small number of principal…
amoeba
- 104,745
172
votes
3 answers
How are the standard errors of coefficients calculated in a regression?
For my own understanding, I am interested in manually replicating the calculation of the standard errors of estimated coefficients as, for example, come with the output of the lm() function in R, but haven't been able to pin it down. What is the…
ako
- 1,823