Highest Voted Questions - Statistical Analysis Stack Exchange

62

votes

1 answer

Bootstrap vs. jackknife

Both bootstrap and jackknife methods can be used to estimate bias and standard error of an estimate and mechanisms of both resampling methods are not huge different: sampling with replacement vs. leave out one observation at a time. However,…

asked Jan 13 '12 at 03:09

Tu.2

2,957

62

votes

4 answers

How does LSTM prevent the vanishing gradient problem?

LSTM was invented specifically to avoid the vanishing gradient problem. It is supposed to do that with the Constant Error Carousel (CEC), which on the diagram below (from Greff et al.) correspond to the loop around cell. (source:…

asked Dec 08 '15 at 09:01

TheWalkingCube

723

62

votes

3 answers

Why does shrinkage work?

In order to solve problems of model selection, a number of methods (LASSO, ridge regression, etc.) will shrink the coefficients of predictor variables towards zero. I am looking for an intuitive explanation of why this improves predictive ability.…

asked Nov 02 '15 at 20:29

aspiringstatistician

621

62

votes

2 answers

Should I normalize word2vec's word vectors before using them?

After training word vectors with word2vec, is it better to normalize them before using them for some downstream applications? I.e what are the pros/cons of normalizing them?

asked Oct 20 '15 at 23:56

Franck Dernoncourt

46,817
33
176
288

62

votes

10 answers

Measuring entropy/ information/ patterns of a 2d binary matrix

I want to measure the entropy/ information density/ pattern-likeness of a two-dimensional binary matrix. Let me show some pictures for clarification: This display should have a rather high entropy: A) This should have medium entropy: B) These…

asked Oct 17 '11 at 12:39

Felix S

4,700

62

votes

3 answers

Why do Convolutional Neural Networks not use a Support Vector Machine to classify?

In recent years, Convolutional Neural Networks (CNNs) have become the state-of-the-art for object recognition in computer vision. Typically, a CNN consists of several convolutional layers, followed by two fully-connected layers. An intuition behind…

asked Aug 20 '15 at 14:43

Karnivaurus

7,019

62

votes

4 answers

Why sigmoid function instead of anything else?

Why is the de-facto standard sigmoid function, $\frac{1}{1+e^{-x}}$, so popular in (non-deep) neural-networks and logistic regression? Why don't we use many of the other derivable functions, with faster computation time or slower decay (so…

asked Jul 24 '15 at 11:14

Mark Horvath

895

62

votes

4 answers

Recurrent vs Recursive Neural Networks: Which is better for NLP?

There are Recurrent Neural Networks and Recursive Neural Networks. Both are usually denoted by the same acronym: RNN. According to Wikipedia, Recurrent NN are in fact Recursive NN, but I don't really understand the explanation. Moreover, I don't…

asked May 22 '15 at 17:50

crscardellino

905

62

votes

12 answers

Software needed to scrape data from graph

Anybody have any experience with software (preferably free, preferably open source) that will take an image of data plotted on cartesian coordinates (a standard, everyday plot) and extract the coordinates of the points plotted on the…

asked Aug 18 '11 at 04:14

Alex Holcombe

539

62

votes

7 answers

Period detection of a generic time series

This post is the continuation of another post related to a generic method for outlier detection in time series. Basically, at this point I'm interested in a robust way to discover the periodicity/seasonality of a generic time series affected by a…

asked Aug 04 '10 at 00:32

gianluca

1,981
4
16
9

61

votes

2 answers

What is the difference between a Normal and a Gaussian Distribution

Is there a deep difference between a Normal and a Gaussian distribution, I've seen many papers using them without distinction, and I usually also refer to them as the same thing. However, my PI recently told me that a normal is the specific case of…

asked Apr 12 '13 at 17:28

Leon palafox

885

61

votes

10 answers

What does "Scientists rise up against statistical significance" mean? (Comment in Nature)

The title of the Comment in Nature Scientists rise up against statistical significance begins with: Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly…

asked Mar 21 '19 at 01:19

uhoh

685

61

votes

4 answers

Box-Cox like transformation for independent variables?

Is there a Box-Cox like transformation for independent variables? That is, a transformation that optimizes the $x$ variable so that the y~f(x) will make a more reasonable fit for a linear model? If so, is there a function to perform this with R?

asked Sep 05 '12 at 10:37

Tal Galili

21,541

61

votes

7 answers

Industry vs Kaggle challenges. Is collecting more observations and having access to more variables more important than fancy modelling?

I'd hope the title is self explanatory. In Kaggle, most winners use stacking with sometimes hundreds of base models, to squeeze a few extra % of MSE, accuracy... In general, in your experience, how important is fancy modelling such as stacking vs…

asked Jul 10 '18 at 12:42

Tom

1,373
10
21

61

votes

11 answers

Brain teaser: How to generate 7 integers with equal probability using a biased coin that has a pr(head) = p?

This is a question I found on Glassdoor: How does one generate 7 integers with equal probability using a coin that has a $\mathbb{Pr}(\text{Head}) = p\in(0,1)$? Basically, you have a coin that may or may not be fair, and this is the only…

asked Jul 05 '18 at 19:55

Amazonian

1,534

Most Popular