Most Popular
1500 questions
56
votes
6 answers
How are propensity scores different from adding covariates in a regression, and when are they preferred to the latter?
I admit I'm relatively new to propensity scores and causal analysis.
One thing that's not obvious to me as a newcomer is how the "balancing" using propensity scores is mathematically different from what happens when we add covariates in a…
Frank Barry
- 731
56
votes
3 answers
Empirical justification for the one standard error rule when using cross-validation
Are there any empirical studies justifying the use of the one standard error rule in favour of parsimony? Obviously it depends on the data-generation process of the data, but anything which analyses a large corpus of datasets would be a very…
DavidShor
- 1,491
56
votes
5 answers
How to assess the similarity of two histograms?
Given two histograms, how do we assess whether they are similar or not?
Is it sufficient to simply look at the two histograms?
The simple one to one mapping has the problem that if a histogram is slightly different and slightly shifted then we'll…
Mew 3.4
- 671
56
votes
1 answer
Logistic regression: anova chi-square test vs. significance of coefficients (anova() vs summary() in R)
I have a logistic GLM model with 8 variables. I ran a chi-square test in R anova(glm.model,test='Chisq') and 2 of the variables turn out to be predictive when ordered at the top of the test and not so much when ordered at the bottom. The…
StreetHawk
- 563
- 1
- 5
- 5
56
votes
5 answers
What do "endogeneity" and "exogeneity" mean substantively?
I understand that the basic definition of endogeneity is that
$$
X'\epsilon=0
$$
is not satisfied, but what does this mean in a real world sense? I read the Wikipedia article, with the supply and demand example, trying to make sense of it, but it…
user25901
- 561
- 1
- 5
- 3
56
votes
2 answers
Using lmer for repeated-measures linear mixed-effect model
EDIT 2: I originally thought I needed to run a two-factor ANOVA with repeated measures on one factor, but I now think a linear mixed-effect model will work better for my data. I think I nearly know what needs to happen, but am still confused by few…
phosphorelated
- 793
56
votes
2 answers
How to interpret p-value of Kolmogorov-Smirnov test (python)?
I have Two samples that I want to test (using python) if they are drawn from the same distribution. To do that I use the statistical function ks_2samp from scipy.stats. It returns 2 values and I find difficulties how to interpret them.
Help please!
meri
- 561
56
votes
12 answers
Is there a 1 in 20 or 1 in 400 chance of guessing the outcome of a d20 roll before it happens?
My friends are in a bit of an argument over Dungeons & Dragons.
My player managed to guess the outcome of a D20 roll before it happened, and my friend said that his chance of guessing the number was 1 in 20. Another friend argues that his chance of…
Theguy Whatguys
- 643
56
votes
2 answers
Pandas / Statsmodel / Scikit-learn
Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another?
Which of these has the most comprehensive functionality?
Which one is actively developed…
Nik
- 1,389
56
votes
8 answers
R libraries for deep learning
I was wondering if there's any good R libraries out there for deep learning neural networks? I know there's the nnet, neuralnet, and RSNNS, but none of these seem to implement deep learning methods.
I'm especially interested in unsupervised…
Zach
- 23,766
56
votes
3 answers
Consider the sum of $n$ uniform distributions on $[0,1]$, or $Z_n$. Why does the cusp in the PDF of $Z_n$ disappear for $n \geq 3$?
I've been wondering about this one for a while; I find it a little weird how abruptly it happens. Basically, why do we need just three uniforms for $Z_n$ to smooth out like it does? And why does the smoothing-out happen so relatively…
tetragrammaton
- 1,446
56
votes
3 answers
Multivariate linear regression vs neural network?
It seems that it is possible to get similar results to a neural network with a multivariate linear regression in some cases, and multivariate linear regression is super fast and easy.
Under what circumstances can neural networks give better results…
Hugh Perkins
- 4,697
56
votes
9 answers
Is it wrong to rephrase "1 in 80 deaths is caused by a car accident" as "1 in 80 people die as a result of a car accident?"
Statement One (S1): "One in 80 deaths is caused by a car accident."
Statement Two (S2): "One in 80 people dies as a result of a car accident."
Now, I personally don't see very much difference at all between these two statements. When writing, I…
56
votes
5 answers
Prediction in Cox regression
I am doing a multivariate Cox regression, I have my significant independent variables and beta values. The model fits to my data very well.
Now, I would like to use my model and predict the survival of a new observation.
I am unclear how to do this…
Marja
- 563
- 1
- 5
- 4
56
votes
4 answers
Normalization vs. scaling
What is the difference between data 'Normalization' and data 'Scaling'? Till now I thought both terms refers to same process but now I realize there is something more that I don't know/understand. Also if there is a difference between Normalization…
d.putto
- 921