Most Popular
1500 questions
40
votes
4 answers
Independent variable = Random variable?
I'm slightly confused if an independent variable (also called predictor or feature) in a statistical model, for example the $X$ in linear regression $Y=\beta_0+\beta_1 X$, is a random variable ?
l7ll7
- 1,275
40
votes
1 answer
How to interpret variance and correlation of random effects in a mixed-effects model?
I hope you all don't mind this question, but I need help interpreting output for a linear mixed effects model output I've been trying to learn to do in R. I am new to longitudinal data analysis and linear mixed effects regression. I have a model I…
Zeda
- 501
- 1
- 5
- 3
40
votes
6 answers
Effect size as the hypothesis for significance testing
Today, at the Cross Validated Journal Club (why weren't you there?), @mbq asked:
Do you think we (modern data scientists) know what significance means? And how it relates to our confidence in our results?
@Michelle replied as some (including me)…
Carlos Accioly
- 5,025
- 4
- 28
- 29
40
votes
2 answers
Is Tikhonov regularization the same as Ridge Regression?
Tikhonov regularization and ridge regression are terms often used as if they were identical. Is it possible to specify exactly what the difference is?
Carl
- 13,084
40
votes
2 answers
Bootstrap prediction interval
Is there any bootstrap technique available to compute prediction intervals for point predictions obtained e.g. from linear regression or other regression method (k-nearest neighbour, regression trees etc.)?
Somehow I feel that the sometimes proposed…
Michael M
- 11,815
- 5
- 33
- 50
40
votes
4 answers
Why use colormap viridis over jet?
As announced in https://www.youtube.com/watch?v=xAoljeRJ3lU, Matplotlib changes the default colormap from jet to viridis.
However, I don't understand it pretty well. Maybe because I'm color blind?
The original colormap jet looks very strong, I can…
ZK Zhao
- 1,275
40
votes
5 answers
Difference between feedback RNN and LSTM/GRU
I am trying to understand different Recurrent Neural Network (RNN) architectures to be applied to time series data and I am getting a bit confused with the different names that are frequently used when describing RNNs. Is the structure of Long…
Josie
- 503
40
votes
3 answers
Meaning (and proof) of "RNN can approximate any algorithm"
Recently I read that a recurrent neural network can approximate any algorithm.
So my question is: what does this exactly mean and can you give me a reference where this is proved?
user3726947
- 503
- 1
- 5
- 6
40
votes
1 answer
XGBoost Loss function Approximation With Taylor Expansion
As an example, take the objective function of the XGBoost model on the $t$'th iteration:
$$\mathcal{L}^{(t)}=\sum_{i=1}^n\ell(y_i,\hat{y}_i^{(t-1)}+f_t(\mathbf{x}_i))+\Omega(f_t)$$
where $\ell$ is the loss function, $f_t$ is the $t$'th tree output…
Alex R.
- 13,897
40
votes
3 answers
What does entropy tell us?
I am reading about entropy and am having a hard time conceptualizing what it means in the continuous case. The wiki page states the following:
The probability distribution of the events, coupled with the
information amount of every event, forms…
RustyStatistician
- 1,989
40
votes
1 answer
Why do we need to normalize the images before we put them into CNN?
I am not clear the reason that we normalise the image for CNN by (image - mean_image)? Thanks!
Zhi Lu
- 737
40
votes
5 answers
How to derive the likelihood function for binomial distribution for parameter estimation?
According to Miller and Freund's Probability and Statistics for Engineers, 8ed (pp.217-218), the likelihood function to be maximised for binomial distribution (Bernoulli trials) is given as
$L(p) = \prod_{i=1}^np^{x_i}(1-p)^{1-x_i}$
How to arrive at…
Ébe Isaac
- 1,082
40
votes
3 answers
Theory behind partial least squares regression
Can anyone recommend a good exposition of the theory behind partial least squares regression (available online) for someone who understands SVD and PCA? I have looked at many sources online and have not found anything that had the right combination…
ClarPaul
- 1,270
- 1
- 12
- 19
40
votes
5 answers
Do working statisticians care about the difference between frequentist and Bayesian inference?
As an outsider, it appears that there are two competing views on how one should perform statistical inference.
Are the two different methods both considered valid by working statisticians?
Is choosing one considered more of a philosophical…
Jonathan Fischoff
- 231
- 3
- 7
40
votes
3 answers
Do we need gradient descent to find the coefficients of a linear regression model?
I was trying to learn machine learning using the Coursera material. In this lecture, Andrew Ng uses gradient descent algorithm to find the coefficients of the linear regression model that will minimize the error function (cost function).
For linear…
Victor
- 6,565