Most Popular
1500 questions
48
votes
2 answers
How to model non-negative zero-inflated continuous data?
I'm currently trying to apply a linear model (family = gaussian) to an indicator of biodiversity that cannot take values lower than zero, is zero-inflated and is continuous. Values range from 0 to a little over 0.25. As a consequence, there is quite…
David
- 481
48
votes
3 answers
How are Random Forests not sensitive to outliers?
I've read in a few sources, including this one, that Random Forests are not sensitive to outliers (in the way that Logistic Regression and other ML methods are, for example).
However, two pieces of intuition tell me otherwise:
Whenever a decision…
makansij
- 2,279
- 9
- 31
- 42
48
votes
7 answers
Where to start with statistics for an experienced developer
During the first half of 2015 I did the coursera course of Machine Learning (by Andrew Ng, GREAT course). And learned the basics of machine learning (linear regression, logistic regression, SVM, Neuronal Networks...)
Also I have been a developer for…
Juan Antonio Gomez Moriano
- 1,329
- 1
- 13
- 16
48
votes
6 answers
How do I avoid overlapping labels in an R plot?
I'm trying to label a pretty simple scatterplot in R. This is what I use:
plot(SI, TI)
text(SI, TI, Name, pos=4, cex=0.7)
The result is mediocre, as you can see (click to enlarge):
I tried to compensate for this using the textxy function, but it's…
slhck
- 837
48
votes
7 answers
Combining probabilities/information from different sources
Lets say I have three independent sources and each of them make predictions for the weather tomorrow. The first one says that the probability of rain tomorrow is 0, then the second one says that the probability is 1, and finally the last one says…
Biela Diela
- 601
- 1
- 6
- 5
48
votes
4 answers
How to calculate a confidence level for a Poisson distribution?
Would like to know how confident I can be in my $\lambda$. Anyone know of a way to set upper and lower confidence levels for a Poisson distribution?
Observations ($n$) = 88
Sample mean ($\lambda$) = 47.18182
what would the 95% confidence look…
Travis
- 771
48
votes
7 answers
Why shouldn't the denominator of the covariance estimator be n-2 rather than n-1?
The denominator of the (unbiased) variance estimator is $n-1$ as there are $n$ observations and only one parameter is being estimated.
$$
\mathbb{V}\left(X\right)=\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}}{n-1}
$$
By the same token I…
MYaseen208
- 2,719
48
votes
1 answer
Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom
Background
Suppose we have an Ordinary Least Squares model where we have $k$ coefficients in our regression model,
$$\mathbf{y}=\mathbf{X}\mathbf{\beta} + \mathbf{\epsilon}$$
where $\mathbf{\beta}$ is an $(k\times1)$ vector of coefficients,…
Garrett
- 661
47
votes
4 answers
Functions of Independent Random Variables
Is the claim that functions of independent random variables are themselves independent, true?
I have seen that result often used implicitly in some proofs, for example in the proof of independence between the sample mean and the sample variance of…
JohnK
- 20,366
47
votes
4 answers
What is the difference between finite and infinite variance
What is the difference between finite and infinite variance ? My stats knowledge is rather basic; Wikipedia / Google wasn't much help here.
47
votes
2 answers
Why is logistic regression a linear model?
I want to know why logistic regression is called a linear model. It uses a sigmoid function, which is not linear. So why is logistic regression a linear model?
user34790
- 6,757
- 10
- 46
- 69
47
votes
4 answers
McFadden's Pseudo-$R^2$ Interpretation
I have a binary logistic regression model with a McFadden's pseudo R-squared of 0.192 with a dependent variable called payment (1 = payment and 0 = no payment). What is the interpretation of this pseudo R-squared?
Is it a relative comparison for…
Matt Reichenbach
- 3,624
47
votes
8 answers
Rigorous definition of an outlier?
People often talk about dealing with outliers in statistics. The thing that bothers me about this is that, as far as I can tell, the definition of an outlier is completely subjective. For example, if the true distribution of some random variable…
dsimcha
- 8,739
47
votes
1 answer
Neural Networks: weight change momentum and weight decay
Momentum $\alpha$ is used to diminish the fluctuations in weight changes over consecutive iterations:
$$\Delta\omega_i(t+1) = - \eta\frac{\partial E}{\partial w_i} + \alpha \Delta \omega_i(t),$$
where $E({\bf w})$ is the error function, ${\bf w}$ -…
Oleg Shirokikh
- 895
- 1
- 9
- 18
47
votes
3 answers
whether to rescale indicator / binary / dummy predictors for LASSO
For the LASSO (and other model selecting procedures) it is crucial to rescale the predictors. The general recommendation I follow is simply to use a 0 mean, 1 standard deviation normalization for continuous variables. But what is there to do with…
László
- 987