Highest Voted Questions - Statistical Analysis Stack Exchange

44

votes

5 answers

Time series 'clustering' in R

I have a set of time series data. Each series covers the same period, although the actual dates in each time series may not all 'line up' exactly. That is to say, if the Time series were to be read into a 2D matrix, it would look something like…

asked Oct 01 '10 at 14:58

morpheous

635

44

votes

4 answers

Variance of $K$-fold cross-validation estimates as $f(K)$: what is the role of "stability"?

TL,DR: It appears that, contrary to oft-repeated advice, leave-one-out cross validation (LOO-CV) -- that is, $K$-fold CV with $K$ (the number of folds) equal to $N$ (the number of training observations) -- yields estimates of the generalization…

asked May 20 '17 at 01:11

Jake Westfall

12,557

44

votes

4 answers

Good accuracy despite high loss value

During the training of a simple neural network binary classifier I get an high loss value, using cross-entropy. Despite this, accuracy's value on validation set holds quite good. Does it have some meaning? There is not a strict correlation between…

asked Jan 25 '17 at 21:27

user146655

441

44

votes

3 answers

Were generative adversarial networks introduced by Jürgen Schmidhuber?

I read on https://en.wikipedia.org/wiki/Generative_adversarial_networks : [Generative adversarial networks] were introduced by Ian Goodfellow et al in 2014. but Jurgen Schmidhuber claims to have performed similar work earlier in that direction…

asked Dec 13 '16 at 23:59

Franck Dernoncourt

46,817
33
176
288

44

votes

4 answers

Intuition for Conditional Expectation of $\sigma$-algebra

Let $(\Omega,\mathscr{F},\mu)$ be a probability space, given a random variable $\xi:\Omega \to \mathbb{R}$ and a $\sigma$-algebra $\mathscr{G}\subseteq \mathscr{F}$ we can construct a new random variable $E[\xi|\mathscr{G}]$, which is the…

asked Aug 18 '16 at 17:45

Nicolas Bourbaki

2,859

44

votes

3 answers

Computing p-value using bootstrap with R

I use "boot" package to compute an approximated 2-sided bootstrapped p-value but the result is too far away from p-value of using t.test. I can't figure out what I did wrong in my R code. Can someone please give me a hint for this time =…

asked Jan 07 '12 at 04:45

Tu.2

2,957

44

votes

1 answer

Training loss goes down and up again. What is happening?

My training loss goes down and then up again. It is very weird. The cross-validation loss tracks the training loss. What is going on? I have two stacked LSTMS as follows (on Keras): model = Sequential() model.add(LSTM(512, return_sequences=True,…

asked Mar 11 '16 at 10:18

patapouf_ai

543

44

votes

3 answers

How to decide which glm family to use?

I have fish density data that I am trying to compare between several different collection techniques, the data has lots of zeros, and the histogram looks vaugley appropriate for a poisson distribution except that, as densities, it is not integer…

asked Jan 14 '16 at 23:57

C. Denney

695
1
6
11

44

votes

4 answers

Do null and alternative hypotheses have to be exhaustive or not?

I saw a lot of times claims that they have to be exhaustive (the examples in such books were always set in such way, that they were indeed), on the other hand I also saw a lot of times books stating they should be exclusive (for example…

hypothesis-testing

asked Nov 26 '11 at 10:24

greenoldman

663

44

votes

2 answers

What is elastic net regularization, and how does it solve the drawbacks of Ridge ($L^2$) and Lasso ($L^1$)?

Is elastic net regularization always preferred to Lasso & Ridge since it seems to solve the drawbacks of these methods? What is the intuition and what is the math behind elastic net?

asked Nov 28 '15 at 17:38

GeorgeOfTheRF

5,593

44

votes

3 answers

Relation between confidence interval and testing statistical hypothesis for t-test

It is well known that confidence intervals and testing statistical hypothesis are strongly related. My questions is focused on comparison of means for two groups based on a numerical variable. Let's assume that such hypothesis is tested using…

asked Nov 10 '11 at 21:49

Lan

1,409

44

votes

1 answer

When is nested cross-validation really needed and can make a practical difference?

When using cross-validation to do model selection (such as e.g. hyperparameter tuning) and to assess the performance of the best model, one should use nested cross-validation. The outer loop is to assess the performance of the model, and the inner…

asked Oct 22 '15 at 14:11

amoeba

104,745

44

votes

5 answers

Why is the CDF of a sample uniformly distributed

I read here that given a sample $ X_1,X_2,...,X_n $ from a continuous distribution with cdf $ F_X $, the sample corresponding to $ U_i = F_X(X_i) $ follows a standard uniform distribution. I have verified this using qualitative simulations in…

asked Jul 15 '15 at 16:33

Maxime Tremblay

443

44

votes

3 answers

10-fold Cross-validation vs leave-one-out cross-validation

I'm doing nested cross-validation. I have read that leave-one-out cross-validation can be biased (don't remember why). Is it better to use 10-fold cross-validation or leave-one-out cross-validation apart from the longer runtime for leave-one-out…

asked May 31 '15 at 12:26

machinery

1,784

44

votes

1 answer

What is the intuitive reason behind doing rotations in Factor Analysis/PCA & how to select appropriate rotation?

My Questions What is the intuitive reason behind doing rotations of factors in factor analysis (or components in PCA)? My understanding is, if variables are almost equally loaded in the top components (or factors) then obviously it is difficult to…

asked May 10 '15 at 14:40

GeorgeOfTheRF

5,593

Most Popular