Most Popular
1500 questions
75
votes
12 answers
What are some of the most common misconceptions about linear regression?
I'm curious, for those of you who have extensive experience collaborating with other researchers, what are some of the most common misconceptions about linear regression that you encounter?
I think can be a useful exercise to think about common…
ST21
- 155
75
votes
4 answers
Why bother with the dual problem when fitting SVM?
Given the data points $x_1, \ldots, x_n \in \mathbb{R}^d$ and labels $y_1, \ldots, y_n \in \left \{-1, 1 \right\}$, the hard margin SVM primal problem is
$$ \text{minimize}_{w, w_0} \quad \frac{1}{2} w^T w $$
$$ \text{s.t.} \quad \forall i: y_i…
blubb
- 2,630
75
votes
6 answers
How to statistically compare two time series?
I have two time series, shown in the plot below:
The plot is showing the full detail of both time series, but I can easily reduce it to just the coincident observations if needed.
My question is: What statistical methods can I use to assess the…
robintw
- 2,117
75
votes
4 answers
What makes the Gaussian kernel so magical for PCA, and also in general?
I was reading about kernel PCA (1, 2, 3) with Gaussian and polynomial kernels.
How does the Gaussian kernel separate seemingly any sort of nonlinear data exceptionally well? Please give an intuitive analysis, as well as a mathematically involved…
Simon Kuang
- 2,111
75
votes
12 answers
Hold-out validation vs. cross-validation
To me, it seems that hold-out validation is useless. That is, splitting the original dataset into two-parts (training and testing) and using the testing score as a generalization measure, is somewhat useless.
K-fold cross-validation seems to give…
user46925
74
votes
4 answers
Testing equality of coefficients from two different regressions
This seems to be a basic issue, but I just realized that I actually don't know how to test equality of coefficients from two different regressions. Can anyone shed some light on this?
More formally, suppose I ran the following two regressions:…
coffeinjunky
- 2,006
74
votes
32 answers
What are the worst (commonly adopted) ideas/principles in statistics?
In my statistical teaching, I encounter some stubborn ideas/principles relating to statistics that have become popularised, yet seem to me to be misleading, or in some cases utterly without merit. I would like to solicit the views of others on this…
Ben
- 124,856
74
votes
6 answers
Why is the Jeffreys prior useful?
I understand that the Jeffreys prior is invariant under re-parameterization. However, what I don't understand is why this property is desired.
Why wouldn't you want the prior to change under a change of variables?
tskuzzy
- 1,003
74
votes
4 answers
Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?
First of all, I realized if I need to perform binary predictions, I have to create at least two classes through performing a one-hot-encoding. Is this correct? However, is binary cross-entropy only for predictions with only one class? If I were to…
infomin101
- 1,733
74
votes
9 answers
What is the difference between discrete data and continuous data?
What is the difference between discrete data and continuous data?
Albort
- 891
74
votes
4 answers
A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them?
On 25 February 2015, the journal Basic and Applied Social Psychology issued an editorial banning $p$-values and confidence intervals from all future papers.
Specifically, they say (formatting and emphasis are mine):
[...] prior to publication,…
amoeba
- 104,745
74
votes
4 answers
What are the differences between 'epoch', 'batch', and 'minibatch'?
As far as I know, when adopting Stochastic Gradient Descent as learning algorithm,
someone use 'epoch' for full dataset, and 'batch' for data used in a single update step, while another use 'batch' and 'minibatch' respectively, and the others use…
Tim
- 841
73
votes
4 answers
Look and you shall find (a correlation)
I have several hundred measurements. Now, I am considering utilizing some kind of software to correlate every measure with every measure. This means that there are thousands of correlations. Among these there should (statistically) be a high…
David
- 905
- 1
- 8
- 7
73
votes
15 answers
Good GUI for R suitable for a beginner wanting to learn programming in R?
Is there any GUI for R that makes it easier for a beginner to start learning and programming in that language?
mariana soffer
- 1,101
73
votes
3 answers
How to actually plot a sample tree from randomForest::getTree()?
Anyone got library or code suggestions on how to actually plot a couple of sample trees from:
getTree(rfobj, k, labelVar=TRUE)
(Yes I know you're not supposed to do this operationally, RF is a blackbox, etc etc. I want to visually sanity-check a…
smci
- 1,472