Most Popular
1500 questions
43
votes
4 answers
What are the advantages of stacking multiple LSTMs?
What are the advantages, why would one use multiple LSTMs, stacked one side-by-side, in a deep-network? I am using a LSTM to represent a sequence of inputs as a single input. So once I have that single representation— why would I pass it through…
wordSmith
- 745
43
votes
1 answer
Relative variable importance for Boosting
I'm looking for an explanation of how relative variable importance is computed in Gradient Boosted Trees that is not overly general/simplistic like:
The measures are based on the number of times a variable is selected for splitting, weighted by the…
Antoine
- 6,159
43
votes
2 answers
Variance of product of dependent variables
What is the formula for variance of product of dependent variables?
In the case of independent variables the formula is simple:
$$ \operatorname{var}(XY) = E(X^2Y^2) - E(XY)^2 = \operatorname{var}(X) \operatorname{var}(Y) +…
Riga
- 133
43
votes
6 answers
Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey
I am used to seeing Ljung-Box test used quite frequently for testing autocorrelation in raw data or in model residuals. I had nearly forgotten that there is another test for autocorrelation, namely, Breusch-Godfrey test.
Question: what are the main…
Richard Hardy
- 67,272
43
votes
3 answers
When should one use Coordinate descent vs. gradient descent?
I was wondering what the different use cases are for the two algorithms, Coordinate Descent and Gradient Descent.
I know that coordinate descent has problems with non-smooth functions but it is used in popular algorithms like SVM and LASSO.
Gradient…
Bar
- 2,862
43
votes
5 answers
LDA vs word2vec
I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity.
As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of…
Piotr Migdal
- 5,776
43
votes
2 answers
Finding Quartiles in R
I'm working through a statistics textbook while learning R and I've run into a stumbling block on the following example:
After looking at ?quantile I attempted to recreate this in R with the following:
> nuclear <- c(7, 20, 16, 6, 58, 9, 20, 50,…
user60305
43
votes
3 answers
What are the measure for accuracy of multilabel data?
Consider a scenario where you are provided with KnownLabel Matrix and PredictedLabel matrix. I would like to measure the goodness of the PredictedLabel matrix against the KnownLabel Matrix.
But the challenge here is that KnownLabel Matrix have few…
Learner
- 4,457
43
votes
9 answers
When teaching statistics, use "normal" or "Gaussian"?
I use mostly "Gaussian distribution" in my book, but someone just suggested I switch to "normal distribution". Any consensus on which term to use for beginners?
Of course the two terms are synonyms, so this is not a question about substance, but…
Harvey Motulsky
- 20,456
42
votes
2 answers
When is logistic regression solved in closed form?
Take $x \in \{0,1\}^d$ and $y \in \{0,1\}$ and suppose we model the task of predicting y given x using logistic regression. When can logistic regression coefficients be written in closed form?
One example is when we use a saturated model.
That is,…
Yaroslav Bulatov
- 6,199
- 2
- 28
- 42
42
votes
3 answers
Distribution of scalar products of two random unit vectors in $D$ dimensions
If $\mathbf{x}$ and $\mathbf{y}$ are two independent random unit vectors in $\mathbb{R}^D$ (uniformly distributed on a unit sphere), what is the distribution of their scalar product (dot product) $\mathbf x \cdot \mathbf y$?
I guess as $D$ grows the…
amoeba
- 104,745
42
votes
4 answers
Justification of one-tailed hypothesis testing
I understand two-tailed hypothesis testing. You have $H_0 : \theta = \theta_0$ (vs. $H_1 = \neg H_0 : \theta \ne \theta_0$). The $p$-value is the probability that $\theta$ generates data at least as extreme as what was observed.
I don't understand…
xyzzyrz
- 3,161
42
votes
2 answers
Error "system is computationally singular" when running a glm
I'm using the robustbase package to run a glm estimation. However when I do it, I get the following error:
Error in solve.default(crossprod(X, DiagB * X)/nobs, EEq) :
system is computationally singular: reciprocal condition number =…
NK1
- 603
42
votes
8 answers
Looking for a good and complete probability and statistics book
I never had the opportunity to visit a stats course from a math faculty. I am looking for a probability theory and statistics book that is complete and self-sufficient. By complete I mean that it contains all the proofs and not just states results.…
Julian Karch
- 1,890
- 1
- 18
- 29
42
votes
4 answers
Good methods for density plots of non-negative variables in R?
plot(density(rexp(100))
Obviously all density to the left of zero represents bias.
I'm looking to summarize some data for non-statisticians, and I want to avoid questions about why non-negative data has density to the left of zero. The plots are…
generic_user
- 13,339