Most Popular
1500 questions
109
votes
13 answers
Understanding "variance" intuitively
What is the cleanest, easiest way to explain someone the concept of variance? What does it intuitively mean? If one is to explain this to their child how would one go about it?
It's a concept that I have difficulty in articulating - especially when…
PhD
- 14,627
109
votes
12 answers
Explain "Curse of dimensionality" to a child
I heard many times about curse of dimensionality, but somehow I'm still unable to grasp the idea, it's all foggy.
Can anyone explain this in the most intuitive way, as you would explain it to a child, so that I (and the others confused as I am)…
Kobe-Wan Kenobi
- 2,857
109
votes
1 answer
Conditional inference trees vs traditional decision trees
Can anyone explain the primary differences between conditional inference trees (ctree from party package in R) compared to the more traditional decision tree algorithms (such as rpart in R)?
What makes CI trees different?
Strengths and…
B_Miner
- 8,630
109
votes
13 answers
Simple algorithm for online outlier detection of a generic time series
I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic).
I would…
gianluca
- 1,981
- 4
- 16
- 9
108
votes
5 answers
Does the variance of a sum equal the sum of the variances?
Is it (always) true that
$$\mathrm{Var}\left(\sum\limits_{i=1}^m{X_i}\right) = \sum\limits_{i=1}^m{\mathrm{Var}(X_i)} \>?$$
Abe
- 3,811
108
votes
4 answers
How to select kernel for SVM?
When using SVM, we need to select a kernel.
I wonder how to select a kernel. Any criteria on kernel selection?
xiaohan2012
- 7,179
108
votes
6 answers
Principled way of collapsing categorical variables with many levels?
What techniques are available for collapsing (or pooling) many categories to a few, for the purpose of using them as an input (predictor) in a statistical model?
Consider a variable like college student major (discipline chosen by an undergraduate…
shadowtalker
- 12,551
108
votes
5 answers
Loadings vs eigenvectors in PCA: when to use one or another?
In principal component analysis (PCA), we get eigenvectors (unit vectors) and eigenvalues. Now, let us define loadings as $$\text{Loadings} = \text{Eigenvectors} \cdot \sqrt{\text{Eigenvalues}}.$$
I know that eigenvectors are just directions and…
user2696565
- 1,389
108
votes
1 answer
Correlation between a nominal (IV) and a continuous (DV) variable
I have a nominal variable (different topics of conversation, coded as topic0=0 etc) and a number of scale variables (DV) such as the length of a conversation.
How can I derive correlations between the nominal and scale variables?
Paul Miller
- 1,081
107
votes
32 answers
What book would you recommend for non-statistician scientists?
What book would you recommend for scientists who are not statisticians?
Clear delivery is most appreciated. As well as the explanation of the appropriate techniques and methods for typical tasks: time series analysis, presentation and aggregation of…
SilentGhost
- 329
107
votes
15 answers
US Election results 2016: What went wrong with prediction models?
First it was Brexit, now the US election. Many model predictions were off by a wide margin, and are there lessons to be learned here? As late as 4 pm PST yesterday, the betting markets were still favoring Hillary 4 to 1.
I take it that the betting…
horaceT
- 3,352
107
votes
9 answers
Generate a random variable with a defined correlation to an existing variable(s)
For a simulation study I have to generate random variables that show a predefined (population) correlation to an existing variable $Y$.
I looked into the R packages copula and CDVine which can produce random multivariate distributions with a given…
Felix S
- 4,700
107
votes
9 answers
Is there an intuitive explanation why multicollinearity is a problem in linear regression?
The wiki discusses the problems that arise when multicollinearity is an issue in linear regression. The basic problem is multicollinearity results in unstable parameter estimates which makes it very difficult to assess the effect of independent…
user28
106
votes
3 answers
What are examples where a "naive bootstrap" fails?
Suppose I have a set of sample data from an unknown or complex distribution, and I want to perform some inference on a statistic $T$ of the data. My default inclination is to just generate a bunch of bootstrap samples with replacement, and calculate…
raegtin
- 9,930
106
votes
19 answers
How to annoy a statistical referee?
I recently asked a question regarding general principles around reviewing statistics in papers. What I would now like to ask, is what particularly irritates you when reviewing a paper, i.e. what's the best way to really annoy a statistical…
csgillespie
- 13,029