Most Popular

1500 questions
63
votes
6 answers

Practical hyperparameter optimization: Random vs. grid search

I'm currently going through Bengio's and Bergstra's Random Search for Hyper-Parameter Optimization [1] where the authors claim random search is more efficient than grid search in achieving approximately equal performance. My question is: Do people…
Bar
  • 2,862
63
votes
2 answers

Is it unusual for the MEAN to outperform ARIMA?

I recently applied a range of forecasting methods (MEAN, RWF, ETS, ARIMA and MLPs) and found that MEAN did surprisingly well. (MEAN: where all future predictions are predicted as been equal to the arithmetic mean of the observed values.) MEAN even…
Andy T
  • 1,184
  • 3
  • 10
  • 16
62
votes
3 answers

Which has the heavier tail, lognormal or gamma?

(This is based on a question that just came to me via email; I've added some context from a previous brief conversation with the same person.) Last year I was told that the gamma distribution is heavier tailed than the lognormal, and I've since been…
Glen_b
  • 282,281
62
votes
5 answers

What are disadvantages of state-space models and Kalman Filter for time-series modelling?

Given all good properties of state-space models and KF, I wonder - what are disadvantages of state-space modelling and using Kalman Filter (or EKF, UKF or particle filter) for estimation? Over let's say conventional methodologies like ARIMA, VAR or…
Kochede
  • 2,117
62
votes
3 answers

ANOVA assumption normality/normal distribution of residuals

The Wikipedia page on ANOVA lists three assumptions, namely: Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity")…
62
votes
6 answers

A chart of daily cases of COVID-19 in a Russian region looks suspiciously level to me - is this so from the statistics viewpoint?

Below is a daily chart of newly-detected COVID infections in Krasnodar Krai, a region of Russia, from April 29 to May 19. The population of the region is 5.5 million people. I read about it and wondered - does this (relatively smooth dynamics of new…
CopperKettle
  • 1,213
62
votes
4 answers

Choosing between LM and GLM for a log-transformed response variable

I'm trying to understand the philosophy behind using a Generalized Linear Model (GLM) vs a Linear Model (LM). I've created an example data set below where: $$\log(y) = x + \varepsilon $$ The example does not have the error $\varepsilon$ as a…
62
votes
7 answers

Intuitive explanation of the bias-variance tradeoff?

I am looking for an intuitive explanation of the bias-variance tradeoff, both in general and specifically in the context of linear regression.
NPE
  • 5,581
  • 6
  • 37
  • 45
62
votes
6 answers

Introduction to statistics for mathematicians

What is a good introduction to statistics for a mathematician who is already well-versed in probability? I have two distinct motivations for asking, which may well lead to different suggestions: I'd like to better understand the statistics…
Mark Meckes
  • 3,126
62
votes
4 answers

How to generate correlated random numbers (given means, variances and degree of correlation)?

I'm sorry if this seems a bit too basic, but I guess I'm just looking to confirm understanding here. I get the sense I'd have to do this in two steps, and I've started trying to grok correlation matrices, but it's just starting to seem really…
62
votes
7 answers

Why is the regularization term *added* to the cost function (instead of multiplied etc.)?

Whenever regularization is used, it is often added onto the cost function such as in the following cost function. $$ J(\theta)=\frac 1 2(y-\theta X^T)(y-\theta X^T)^T+\alpha\|\theta\|_2^2 $$ This makes intuitive sense to me since minimize the cost…
62
votes
7 answers

Is there any gold standard for modeling irregularly spaced time series?

In field of economics (I think) we have ARIMA and GARCH for regularly spaced time series and Poisson, Hawkes for modeling point processes, so how about attempts for modeling irregularly (unevenly) spaced time series - are there (at least) any…
Qbik
  • 1,707
62
votes
3 answers

Why do we only see $L_1$ and $L_2$ regularization but not other norms?

I am just curious why there are usually only $L_1$ and $L_2$ norms regularization. Are there proofs of why these are better?
user10024395
  • 1
  • 2
  • 11
  • 21
62
votes
1 answer

Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?

In my mind, KL divergence from sample distribution to true distribution is simply the difference between cross entropy and entropy. Why do we use cross entropy to be the cost function in many machine learning models, but use Kullback-Leibler…
JimSpark
  • 723
62
votes
2 answers

How does centering the data get rid of the intercept in regression and PCA?

I keep reading about instances where we center the data (e.g., with regularization or PCA) in order to remove the intercept (as mentioned in this question). I know it's simple, but I'm having a hard time intuitively understanding this. Could someone…
Alec
  • 2,385