Highest Voted Questions - Statistical Analysis Stack Exchange

43

votes

4 answers

What are the advantages of stacking multiple LSTMs?

What are the advantages, why would one use multiple LSTMs, stacked one side-by-side, in a deep-network? I am using a LSTM to represent a sequence of inputs as a single input. So once I have that single representation— why would I pass it through…

asked Jul 27 '15 at 01:57

wordSmith

745

43

votes

1 answer

Relative variable importance for Boosting

I'm looking for an explanation of how relative variable importance is computed in Gradient Boosted Trees that is not overly general/simplistic like: The measures are based on the number of times a variable is selected for splitting, weighted by the…

asked Jul 19 '15 at 13:29

Antoine

6,159

43

votes

2 answers

Variance of product of dependent variables

What is the formula for variance of product of dependent variables? In the case of independent variables the formula is simple: $$ \operatorname{var}(XY) = E(X^2Y^2) - E(XY)^2 = \operatorname{var}(X) \operatorname{var}(Y) +…

asked Sep 23 '11 at 16:23

Riga

133

43

votes

6 answers

Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey

I am used to seeing Ljung-Box test used quite frequently for testing autocorrelation in raw data or in model residuals. I had nearly forgotten that there is another test for autocorrelation, namely, Breusch-Godfrey test. Question: what are the main…

asked Apr 23 '15 at 19:24

Richard Hardy

67,272

43

votes

3 answers

When should one use Coordinate descent vs. gradient descent?

I was wondering what the different use cases are for the two algorithms, Coordinate Descent and Gradient Descent. I know that coordinate descent has problems with non-smooth functions but it is used in popular algorithms like SVM and LASSO. Gradient…

asked Apr 14 '15 at 14:38

Bar

2,862

43

votes

5 answers

LDA vs word2vec

I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity. As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of…

asked Apr 09 '15 at 09:17

Piotr Migdal

5,776

43

votes

2 answers

Finding Quartiles in R

I'm working through a statistics textbook while learning R and I've run into a stumbling block on the following example: After looking at ?quantile I attempted to recreate this in R with the following: > nuclear <- c(7, 20, 16, 6, 58, 9, 20, 50,…

asked Jan 20 '15 at 16:42

user60305

43

votes

3 answers

What are the measure for accuracy of multilabel data?

Consider a scenario where you are provided with KnownLabel Matrix and PredictedLabel matrix. I would like to measure the goodness of the PredictedLabel matrix against the KnownLabel Matrix. But the challenge here is that KnownLabel Matrix have few…

asked Jul 06 '11 at 05:05

Learner

4,457

43

votes

9 answers

When teaching statistics, use "normal" or "Gaussian"?

I use mostly "Gaussian distribution" in my book, but someone just suggested I switch to "normal distribution". Any consensus on which term to use for beginners? Of course the two terms are synonyms, so this is not a question about substance, but…

asked Sep 08 '14 at 23:43

Harvey Motulsky

20,456

42

votes

2 answers

When is logistic regression solved in closed form?

Take $x \in \{0,1\}^d$ and $y \in \{0,1\}$ and suppose we model the task of predicting y given x using logistic regression. When can logistic regression coefficients be written in closed form? One example is when we use a saturated model. That is,…

asked Jul 28 '10 at 21:59

Yaroslav Bulatov

6,199
2
28
42

42

votes

3 answers

Distribution of scalar products of two random unit vectors in $D$ dimensions

If $\mathbf{x}$ and $\mathbf{y}$ are two independent random unit vectors in $\mathbb{R}^D$ (uniformly distributed on a unit sphere), what is the distribution of their scalar product (dot product) $\mathbf x \cdot \mathbf y$? I guess as $D$ grows the…

asked Feb 08 '14 at 22:33

amoeba

104,745

42

votes

4 answers

Justification of one-tailed hypothesis testing

I understand two-tailed hypothesis testing. You have $H_0 : \theta = \theta_0$ (vs. $H_1 = \neg H_0 : \theta \ne \theta_0$). The $p$-value is the probability that $\theta$ generates data at least as extreme as what was observed. I don't understand…

hypothesis-testing

asked Mar 03 '11 at 19:35

xyzzyrz

3,161

42

votes

2 answers

Error "system is computationally singular" when running a glm

I'm using the robustbase package to run a glm estimation. However when I do it, I get the following error: Error in solve.default(crossprod(X, DiagB * X)/nobs, EEq) : system is computationally singular: reciprocal condition number =…

asked Nov 13 '13 at 18:11

NK1

603

42

votes

8 answers

Looking for a good and complete probability and statistics book

I never had the opportunity to visit a stats course from a math faculty. I am looking for a probability theory and statistics book that is complete and self-sufficient. By complete I mean that it contains all the proofs and not just states results.…

asked Sep 19 '13 at 22:14

Julian Karch

1,890
1
18
29

42

votes

4 answers

Good methods for density plots of non-negative variables in R?

plot(density(rexp(100)) Obviously all density to the left of zero represents bias. I'm looking to summarize some data for non-statisticians, and I want to avoid questions about why non-negative data has density to the left of zero. The plots are…

asked Jul 29 '13 at 06:57

generic_user

13,339

Most Popular