Most Popular

1500 questions
40
votes
4 answers

Independent variable = Random variable?

I'm slightly confused if an independent variable (also called predictor or feature) in a statistical model, for example the $X$ in linear regression $Y=\beta_0+\beta_1 X$, is a random variable ?
l7ll7
  • 1,275
40
votes
1 answer

How to interpret variance and correlation of random effects in a mixed-effects model?

I hope you all don't mind this question, but I need help interpreting output for a linear mixed effects model output I've been trying to learn to do in R. I am new to longitudinal data analysis and linear mixed effects regression. I have a model I…
Zeda
  • 501
  • 1
  • 5
  • 3
40
votes
6 answers

Effect size as the hypothesis for significance testing

Today, at the Cross Validated Journal Club (why weren't you there?), @mbq asked: Do you think we (modern data scientists) know what significance means? And how it relates to our confidence in our results? @Michelle replied as some (including me)…
Carlos Accioly
  • 5,025
  • 4
  • 28
  • 29
40
votes
2 answers

Is Tikhonov regularization the same as Ridge Regression?

Tikhonov regularization and ridge regression are terms often used as if they were identical. Is it possible to specify exactly what the difference is?
Carl
  • 13,084
40
votes
2 answers

Bootstrap prediction interval

Is there any bootstrap technique available to compute prediction intervals for point predictions obtained e.g. from linear regression or other regression method (k-nearest neighbour, regression trees etc.)? Somehow I feel that the sometimes proposed…
Michael M
  • 11,815
  • 5
  • 33
  • 50
40
votes
4 answers

Why use colormap viridis over jet?

As announced in https://www.youtube.com/watch?v=xAoljeRJ3lU, Matplotlib changes the default colormap from jet to viridis. However, I don't understand it pretty well. Maybe because I'm color blind? The original colormap jet looks very strong, I can…
ZK Zhao
  • 1,275
40
votes
5 answers

Difference between feedback RNN and LSTM/GRU

I am trying to understand different Recurrent Neural Network (RNN) architectures to be applied to time series data and I am getting a bit confused with the different names that are frequently used when describing RNNs. Is the structure of Long…
Josie
  • 503
40
votes
3 answers

Meaning (and proof) of "RNN can approximate any algorithm"

Recently I read that a recurrent neural network can approximate any algorithm. So my question is: what does this exactly mean and can you give me a reference where this is proved?
user3726947
  • 503
  • 1
  • 5
  • 6
40
votes
1 answer

XGBoost Loss function Approximation With Taylor Expansion

As an example, take the objective function of the XGBoost model on the $t$'th iteration: $$\mathcal{L}^{(t)}=\sum_{i=1}^n\ell(y_i,\hat{y}_i^{(t-1)}+f_t(\mathbf{x}_i))+\Omega(f_t)$$ where $\ell$ is the loss function, $f_t$ is the $t$'th tree output…
Alex R.
  • 13,897
40
votes
3 answers

What does entropy tell us?

I am reading about entropy and am having a hard time conceptualizing what it means in the continuous case. The wiki page states the following: The probability distribution of the events, coupled with the information amount of every event, forms…
40
votes
1 answer

Why do we need to normalize the images before we put them into CNN?

I am not clear the reason that we normalise the image for CNN by (image - mean_image)? Thanks!
Zhi Lu
  • 737
40
votes
5 answers

How to derive the likelihood function for binomial distribution for parameter estimation?

According to Miller and Freund's Probability and Statistics for Engineers, 8ed (pp.217-218), the likelihood function to be maximised for binomial distribution (Bernoulli trials) is given as $L(p) = \prod_{i=1}^np^{x_i}(1-p)^{1-x_i}$ How to arrive at…
Ébe Isaac
  • 1,082
40
votes
3 answers

Theory behind partial least squares regression

Can anyone recommend a good exposition of the theory behind partial least squares regression (available online) for someone who understands SVD and PCA? I have looked at many sources online and have not found anything that had the right combination…
ClarPaul
  • 1,270
  • 1
  • 12
  • 19
40
votes
5 answers

Do working statisticians care about the difference between frequentist and Bayesian inference?

As an outsider, it appears that there are two competing views on how one should perform statistical inference. Are the two different methods both considered valid by working statisticians? Is choosing one considered more of a philosophical…
40
votes
3 answers

Do we need gradient descent to find the coefficients of a linear regression model?

I was trying to learn machine learning using the Coursera material. In this lecture, Andrew Ng uses gradient descent algorithm to find the coefficients of the linear regression model that will minimize the error function (cost function). For linear…
Victor
  • 6,565