Highest Voted Questions - Statistical Analysis Stack Exchange

63

votes

6 answers

Practical hyperparameter optimization: Random vs. grid search

I'm currently going through Bengio's and Bergstra's Random Search for Hyper-Parameter Optimization [1] where the authors claim random search is more efficient than grid search in achieving approximately equal performance. My question is: Do people…

asked Jul 08 '15 at 14:25

Bar

2,862

63

votes

2 answers

Is it unusual for the MEAN to outperform ARIMA?

I recently applied a range of forecasting methods (MEAN, RWF, ETS, ARIMA and MLPs) and found that MEAN did surprisingly well. (MEAN: where all future predictions are predicted as been equal to the arithmetic mean of the observed values.) MEAN even…

asked Nov 21 '14 at 13:14

Andy T

1,184
3
10
16

62

votes

3 answers

Which has the heavier tail, lognormal or gamma?

(This is based on a question that just came to me via email; I've added some context from a previous brief conversation with the same person.) Last year I was told that the gamma distribution is heavier tailed than the lognormal, and I've since been…

asked Feb 13 '14 at 06:01

Glen_b

282,281

62

votes

5 answers

What are disadvantages of state-space models and Kalman Filter for time-series modelling?

Given all good properties of state-space models and KF, I wonder - what are disadvantages of state-space modelling and using Kalman Filter (or EKF, UKF or particle filter) for estimation? Over let's say conventional methodologies like ARIMA, VAR or…

asked Dec 02 '13 at 10:53

Kochede

2,117

62

votes

3 answers

ANOVA assumption normality/normal distribution of residuals

The Wikipedia page on ANOVA lists three assumptions, namely: Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity")…

asked Jan 18 '11 at 19:07

Roman Luštrik

3,718

62

votes

6 answers

A chart of daily cases of COVID-19 in a Russian region looks suspiciously level to me - is this so from the statistics viewpoint?

Below is a daily chart of newly-detected COVID infections in Krasnodar Krai, a region of Russia, from April 29 to May 19. The population of the region is 5.5 million people. I read about it and wondered - does this (relatively smooth dynamics of new…

asked May 21 '20 at 11:53

CopperKettle

1,213

62

votes

4 answers

Choosing between LM and GLM for a log-transformed response variable

I'm trying to understand the philosophy behind using a Generalized Linear Model (GLM) vs a Linear Model (LM). I've created an example data set below where: $$\log(y) = x + \varepsilon $$ The example does not have the error $\varepsilon$ as a…

asked Nov 19 '12 at 13:28

Marc in the box

3,712

62

votes

7 answers

Intuitive explanation of the bias-variance tradeoff?

I am looking for an intuitive explanation of the bias-variance tradeoff, both in general and specifically in the context of linear regression.

asked Nov 07 '10 at 10:57

NPE

5,581
6
37
45

62

votes

6 answers

Introduction to statistics for mathematicians

What is a good introduction to statistics for a mathematician who is already well-versed in probability? I have two distinct motivations for asking, which may well lead to different suggestions: I'd like to better understand the statistics…

references

asked Jul 21 '10 at 13:50

Mark Meckes

3,126

62

votes

4 answers

How to generate correlated random numbers (given means, variances and degree of correlation)?

I'm sorry if this seems a bit too basic, but I guess I'm just looking to confirm understanding here. I get the sense I'd have to do this in two steps, and I've started trying to grok correlation matrices, but it's just starting to seem really…

asked Oct 07 '12 at 19:45

Joseph Weissman

731

62

votes

7 answers

Why is the regularization term added to the cost function (instead of multiplied etc.)?

Whenever regularization is used, it is often added onto the cost function such as in the following cost function. $$ J(\theta)=\frac 1 2(y-\theta X^T)(y-\theta X^T)^T+\alpha\|\theta\|_2^2 $$ This makes intuitive sense to me since minimize the cost…

regularization

asked May 22 '18 at 09:48

grenmester

745

62

votes

7 answers

Is there any gold standard for modeling irregularly spaced time series?

In field of economics (I think) we have ARIMA and GARCH for regularly spaced time series and Poisson, Hawkes for modeling point processes, so how about attempts for modeling irregularly (unevenly) spaced time series - are there (at least) any…

asked Aug 06 '12 at 21:05

Qbik

1,707

62

votes

3 answers

Why do we only see $L_1$ and $L_2$ regularization but not other norms?

I am just curious why there are usually only $L_1$ and $L_2$ norms regularization. Are there proofs of why these are better?

asked Mar 23 '17 at 09:28

user10024395

1
2
11
21

62

votes

1 answer

Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?

In my mind, KL divergence from sample distribution to true distribution is simply the difference between cross entropy and entropy. Why do we use cross entropy to be the cost function in many machine learning models, but use Kullback-Leibler…

asked Mar 07 '17 at 13:26

JimSpark

723

62

votes

2 answers

How does centering the data get rid of the intercept in regression and PCA?

I keep reading about instances where we center the data (e.g., with regularization or PCA) in order to remove the intercept (as mentioned in this question). I know it's simple, but I'm having a hard time intuitively understanding this. Could someone…

asked Feb 06 '12 at 06:45

Alec

2,385

Most Popular