Most Popular

1500 questions
35
votes
1 answer

Multiple Imputation by Chained Equations (MICE) Explained

I have seen Multiple Imputation by Chained Equations (MICE) used as a missing data handling method. Is anyone able to provide a simple explanation of how MICE works?
Mike Tauber
  • 1,037
35
votes
4 answers

Why isn't RANSAC most widely used in statistics?

Coming from the field of computer vision, I've often used the RANSAC (Random Sample Consensus) method for fitting models to data with lots of outliers. However, I've never seen it used by statisticians, and I've always been under the impression…
Bossykena
  • 677
35
votes
3 answers

What is the difference between kernel, bias, and activity regulizers, and when to use which?

I've read this post, but I wanted more clarification for a broader question. In Keras, there are now three types of regularizers for a layer: kernel_regularizer, bias_regularizer, activity_regularizer. I have read posts that explain the difference…
Christian
  • 1,872
  • 4
  • 21
  • 28
35
votes
4 answers

Difference between rungs two and three in the Ladder of Causation

In Judea Pearl's "Book of Why" he talks about what he calls the Ladder of Causation, which is essentially a hierarchy comprised of different levels of causal reasoning. The lowest is concerned with patterns of association in observed data (e.g.,…
dsaxton
  • 12,138
  • 1
  • 26
  • 48
35
votes
1 answer

Why is PCA sensitive to outliers?

There are many posts on this SE that discuss robust approaches to principal component analysis (PCA), but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place.
Psi
  • 482
35
votes
5 answers

How to handle a "self defeating" prediction model?

I was watching a presentation by an ML specialist from a major retailer, where they had developed a model to predict out of stock events. Let's assume for a moment that over time, their model becomes very accurate, wouldn't that somehow be…
Skander H.
  • 11,888
  • 2
  • 41
  • 97
35
votes
3 answers

Propensity score matching after multiple imputation

I refer to this paper: Hayes JR, Groner JI. "Using multiple imputation and propensity scores to test the effect of car seats and seat belt usage on injury severity from trauma registry data." J Pediatr Surg. 2008 May;43(5):924-7. In this study,…
Joe King
  • 3,805
35
votes
1 answer

Why does glmnet use "naive" elastic net from the Zou & Hastie original paper?

The original elastic net paper Zou & Hastie (2005) Regularization and variable selection via the elastic net introduced elastic net loss function for linear regression (here I assume all variables are centered and scaled to unit variance):…
amoeba
  • 104,745
35
votes
4 answers

How is Poisson distribution different to normal distribution?

I have generated a vector which has a Poisson distribution, as follows: x = rpois(1000,10) If I make a histogram using hist(x), the distribution looks like a the familiar bell-shaped normal distribution. However, a the Kolmogorov-Smirnoff test…
luciano
  • 14,269
35
votes
5 answers

What are "residual connections" in RNNs?

In Google's paper Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, it is stated Our LSTM RNNs have $8$ layers, with residual connections between layers ... What are residual connections? Why…
user82135
35
votes
3 answers

Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution?

My understanding is that when using a Bayesian approach to estimate parameter values: The posterior distribution is the combination of the prior distribution and the likelihood distribution. We simulate this by generating a sample from the…
Dave
  • 2,611
35
votes
5 answers

What problem does oversampling, undersampling, and SMOTE solve?

In a recent, well recieved, question, Tim asks when is unbalanced data really a problem in Machine Learning? The premise of the question is that there is a lot of machine learning literature discussing class balance and the problem of imbalanced…
Matthew Drury
  • 35,629
35
votes
5 answers

Is an overfitted model necessarily useless?

Assume that a model has 100% accuracy on the training data, but 70% accuracy on the test data. Is the following argument true about this model? It is obvious that this is an overfitted model. The test accuracy can be enhanced by reducing the…
Hossein
  • 3,454
35
votes
5 answers

Modelling longitudinal data where the effect of time varies in functional form between individuals

Context: Imagine you had a longitudinal study which measured a dependent variable (DV) once a week for 20 weeks on 200 participants. Although I'm interested in general, typical DVs that I'm thinking of include job performance following hire or…
Jeromy Anglim
  • 44,984
35
votes
6 answers

Why is the expected value named so?

I understand how we get 3.5 as the expected value for rolling a fair 6-sided die. But intuitively, I can expect each face with equal chance of 1/6. So shouldn't the expected value of rolling a die be either of the number between 1-6 with equal…