Most Popular
1500 questions
133
votes
29 answers
132
votes
4 answers
What does a "closed-form solution" mean?
I have come across the term "closed-form solution" quite often. What does a closed-form solution mean? How does one determine if a close-form solution exists for a given problem? Searching online, I found some information, but nothing in the context…
arjsgh21
- 2,633
132
votes
14 answers
Maximum Likelihood Estimation (MLE) in layman terms
Could anyone explain to me in detail about maximum likelihood estimation (MLE) in layman's terms? I would like to know the underlying concept before going into mathematical derivation or equation.
StatsUser
- 1,809
131
votes
4 answers
When to use gamma GLMs?
The gamma distribution can take on a pretty wide range of shapes, and given the link between the mean and the variance through its two parameters, it seems suited to dealing with heteroskedasticity in non-negative data, in a way that log-transformed…
generic_user
- 13,339
131
votes
5 answers
What are the main differences between K-means and K-nearest neighbours?
I know that k-means is unsupervised and is used for clustering etc and that k-NN is supervised. But I wanted to know concrete differences between the two?
nsc010
- 1,657
131
votes
4 answers
Softmax vs Sigmoid function in Logistic classifier?
What decides the choice of function ( Softmax vs Sigmoid ) in a Logistic classifier ?
Suppose there are 4 output classes . Each of the above function gives the probabilities of each class being the correct output . So which one to take for a…
mach
- 1,805
131
votes
4 answers
PCA and proportion of variance explained
In general, what is meant by saying that the fraction $x$ of the variance in an analysis like PCA is explained by the first principal component? Can someone explain this intuitively but also give a precise mathematical definition of what "variance…
user9097
- 3,263
131
votes
4 answers
Differences between cross validation and bootstrapping to estimate the prediction error
I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error.
Does one work better for small dataset sizes or large datasets?
grant
- 1,531
131
votes
6 answers
How would you explain the difference between correlation and covariance?
Following up on this question, How would you explain covariance to someone who understands only the mean?, which addresses the issue of explaining covariance to a lay person, brought up a similar question in my mind.
How would one explain to a…
pmgjones
- 5,773
- 8
- 38
- 36
131
votes
5 answers
Using k-fold cross-validation for time-series model selection
Question:
I want to be sure of something, is the use of k-fold cross-validation with time series is straightforward, or does one need to pay special attention before using it?
Background:
I'm modeling a time series of 6 year (with semi-markov…
Mickaël S
- 1,468
130
votes
9 answers
Numerical example to understand Expectation-Maximization
I am trying to get a good grasp on the EM algorithm, to be able to implement and use it. I spent a full day reading the theory and a paper where EM is used to track an aircraft using the position information coming from a radar. Honestly, I don't…
arjsgh21
- 2,633
130
votes
6 answers
Difference between confidence intervals and prediction intervals
For a prediction interval in linear regression you still use $\hat{E}[Y|x] = \hat{\beta_0}+\hat{\beta}_{1}x$ to generate the interval. You also use this to generate a confidence interval of $E[Y|x_0]$. What's the difference between the two?
question
- 1,485
128
votes
22 answers
Most interesting statistical paradoxes
Because I find them fascinating, I'd like to hear what folks in this community find as the most interesting statistical paradox and why.
Nick
- 3,537
128
votes
6 answers
What loss function for multi-class, multi-label classification tasks in neural networks?
I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label).
I read that for multi-class problems it is generally recommended to use softmax and…
aKzenT
- 1,381
127
votes
5 answers
Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
TL;DR
See title.
Motivation
I am hoping for a canonical answer along the lines of "(1) No, (2) Not applicable, because (1)", which we can use to close many wrong questions about unbalanced datasets and oversampling. I would be quite as happy to be…
Stephan Kolassa
- 123,354