Most Popular
1500 questions
47
votes
3 answers
Why is RSS distributed chi square times n-p?
I would like to understand why, under the OLS model, the RSS (residual sum of squares) is distributed $$\chi^2\cdot (n-p)$$ ($p$ being the number of parameters in the model, $n$ the number of observations).
I apologize for asking such a basic…
Tal Galili
- 21,541
47
votes
3 answers
What is the relationship between orthogonal, correlation and independence?
I've read an article saying that when using planned contrasts to find means that are different in an one way ANOVA, constrasts should be orthogonal so that they are uncorrelated and prevent the type I error from being inflated.
I don't understand…
Carl Levasseur
- 623
47
votes
5 answers
Lift measure in data mining
I searched many websites to know what exactly lift will do? The results that I found all were about using it in applications not itself.
I know about the support and confidence function. From Wikipedia, in data mining, lift is a measure of the…
Nickool
- 625
47
votes
2 answers
Difference between LOESS and LOWESS
What is the difference between LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing)? From Wikipedia I can only see that LOESS is a generalization of LOWESS. Do they have slightly different parameters?
pir
- 5,056
47
votes
9 answers
Is it valid to include a baseline measure as control variable when testing the effect of an independent variable on change scores?
I am attempting to run an OLS regression:
DV: Change in weight over a year (initial weight - end weight)
IV: Whether or not you exercise.
However, it seems reasonable that heavier people will lose more weight per unit of exercise than thinner…
ChrisStata
- 621
47
votes
3 answers
Are pooling layers added before or after dropout layers?
I'm creating a convolutional neural network (CNN), where I have a convolutional layer followed by a pooling layer and I want to apply dropout to reduce overfitting. I have this feeling that the dropout layer should be applied after the pooling…
pir
- 5,056
47
votes
5 answers
Why is multiple comparison a problem?
I find it hard to understand what really is the issue with multiple comparisons. With a simple analogy, it is said that a person who will make many decisions will make many mistakes. So very conservative precaution is applied, like Bonferroni…
AgCl
- 613
47
votes
1 answer
PCA and Correspondence analysis in their relation to Biplot
Biplot is often used to display results of principal component analysis (and of related techniques). It is a dual or overlay scatterplot showing component loadings and component scores simultaneously. I was informed by @amoeba today that he has…
ttnphns
- 57,480
- 49
- 284
- 501
47
votes
6 answers
When to use simulations?
So this is a very simple and basic question. However, when I was in school, I paid very little attention to the whole concept of simulations in class and that's left me a little terrified of that process.
Can you explain the simulation process in…
AMathew
- 1,060
- 1
- 13
- 20
47
votes
2 answers
Differences between Bhattacharyya distance and KL divergence
I'm looking for an intuitive explanation for the following questions:
In statistics and information theory, what's the difference between Bhattacharyya distance and KL divergence, as measures of the difference between two discrete probability…
JewelSue
- 573
47
votes
5 answers
What exactly is a Bayesian model?
Can I call a model wherein Bayes' Theorem is used a "Bayesian model"? I am afraid such a definition might be too broad.
So what exactly is a Bayesian model?
Sibbs Gambling
- 2,609
47
votes
5 answers
Statistical models cheat sheet
I was wondering if there is a statistical model "cheat sheet(s)" that lists any or more information:
when to use the model
when not to use the model
required and optional inputs
expected outputs
has the model been tested in different fields…
dassouki
- 1,429
47
votes
3 answers
How does R handle missing values in lm?
I'd like to regress a vector B against each of the columns in a matrix A. This is trivial if there are no missing data, but if matrix A contains missing values, then my regression against A is constrained to include only rows where all values are…
David Quigley
- 573
47
votes
6 answers
What is your favorite statistical graph?
This is a favorite of mine
This example is in a humorous vein (credit goes to a former professor of mine, Steven Gortmaker), but I am also interested in graphs that you feel beautifully capture and communicate a statistical insight or method, along…
Alexis
- 29,850
47
votes
4 answers
How to interpret mean of Silhouette plot?
Im trying to use silhouette plot to determine the number of cluster in my dataset. Given the dataset Train , i used the following matlab code
Train_data = full(Train);
Result = [];
for num_of_cluster = 1:20
centroid =…
Learner
- 4,457