Highest Voted Questions - Statistical Analysis Stack Exchange

72

votes

5 answers

What is so cool about de Finetti's representation theorem?

From Theory of Statistics by Mark J. Schervish (page 12): Although DeFinetti's representation theorem 1.49 is central to motivating parametric models, it is not actually used in their implementation. How is the theorem central to parametric…

asked Aug 16 '12 at 17:40

gui11aume

14,703

72

votes

7 answers

Why is tanh almost always better than sigmoid as an activation function?

In Andrew Ng's Neural Networks and Deep Learning course on Coursera he says that using $tanh$ is almost always preferable to using $sigmoid$. The reason he gives is that the outputs using $tanh$ centre around 0 rather than $sigmoid$'s 0.5, and this…

asked Feb 26 '18 at 08:45

Tom Hale

2,561

72

votes

8 answers

How to simulate data that satisfy specific constraints such as having specific mean and standard deviation?

This question is motivated by my question on meta-analysis. But I imagine that it would also be useful in teaching contexts where you want to create a dataset that exactly mirrors an existing published dataset. I know how to generate random data…

asked Jun 12 '12 at 11:03

Jeromy Anglim

44,984

72

votes

7 answers

What is a "saturated" model?

What is meant when we say we have a saturated model?

asked Jul 20 '10 at 12:09

Graham Cookson

8,061

72

votes

4 answers

What is the proper usage of scale_pos_weight in xgboost for imbalanced datasets?

I have a very imbalanced dataset. I'm trying to follow the tuning advice and use scale_pos_weight but not sure how should I tune it. I can see that RegLossObj.GetGradient does: if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight so a gradient…

asked Oct 30 '16 at 13:59

ihadanny

3,300

72

votes

2 answers

How to interpret type I, type II, and type III ANOVA and MANOVA?

My primary question is how to interpret the output (coefficients, F, P) when conducting a Type I (sequential) ANOVA? My specific research problem is a bit more complex, so I will break my example into parts. First, if I am interested in the effect…

asked Jan 01 '12 at 18:28

djhocking

1,931

72

votes

4 answers

How do you calculate the probability density function of the maximum of a sample of IID uniform random variables?

Given the random variable $$Y = \max(X_1, X_2, \ldots, X_n)$$ where $X_i$ are IID uniform variables, how do I calculate the PDF of $Y$?

asked Nov 15 '11 at 19:34

Mascarpone

861

72

votes

2 answers

Derivation of closed form lasso solution

For the lasso problem $\min_\beta (Y-X\beta)^T(Y-X\beta)$ such that $\|\beta\|_1 \leq t$. I often see the soft-thresholding result $$ \beta_j^{\text{lasso}}= \mathrm{sgn}(\beta^{\text{LS}}_j)(|\beta_j^{\text{LS}}|-\gamma)^+ $$ for the orthonormal…

lasso

asked Nov 01 '11 at 00:03

Gary

1,601

72

votes

4 answers

How to tune hyperparameters of xgboost trees?

I have a class imbalanced data & I want to tune the hyperparameters of the boosted tress using xgboost. Questions Is there an equivalent of gridsearchcv or randomsearchcv for xgboost? If not what is the recommended approach to tune the parameters…

asked Sep 04 '15 at 02:23

GeorgeOfTheRF

5,593

72

votes

6 answers

Difference between "kernel" and "filter" in CNN

What is the difference between the terms "kernel" and "filter" in the context of convolutional neural networks?

asked May 31 '15 at 06:19

ryguy

941
1
7
7

72

votes

12 answers

What does orthogonal mean in the context of statistics?

In other contexts, orthogonal means "at right angles" or "perpendicular". What does orthogonal mean in a statistical context? Thanks for any clarifications.

descriptive-statistics

asked Jun 20 '11 at 12:38

pmgjones

5,773
8
38
36

72

votes

3 answers

Why does ridge estimate become better than OLS by adding a constant to the diagonal?

I understand that the ridge regression estimate is the $\beta$ that minimizes residual sum of square and a penalty on the size of $\beta$ $$\beta_\mathrm{ridge} = (\lambda I_D + X'X)^{-1}X'y = \operatorname{argmin}\big[ \text{RSS} + \lambda…

asked Oct 11 '14 at 18:52

Heisenberg

4,590
4
30
62

72

votes

4 answers

Intuitive explanation of Fisher Information and Cramer-Rao bound

I am not comfortable with Fisher information, what it measures and how is it helpful. Also it's relationship with the Cramer-Rao bound is not apparent to me. Can someone please give an intuitive explanation of these concepts?

asked May 09 '11 at 20:43

Infinity

963
1
8
7

71

votes

2 answers

Do we need a global test before post hoc tests?

I often hear that post hoc tests after an ANOVA can only be used if the ANOVA itself was significant. However, post hoc tests adjust $p$-values to keep the global type I error rate at 5%, don't they? So why do we need the global test first? If…

asked Apr 19 '11 at 16:51

even

2,347
6
19
13

71

votes

2 answers

Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

The coefficient of an explanatory variable in a multiple regression tells us the relationship of that explanatory variable with the dependent variable. All this, while 'controlling' for the other explanatory variables. How I have viewed it so…

asked Dec 07 '13 at 02:14

Siddharth Gopi

1,555

Most Popular