Most Popular

1500 questions
48
votes
2 answers

How to model non-negative zero-inflated continuous data?

I'm currently trying to apply a linear model (family = gaussian) to an indicator of biodiversity that cannot take values lower than zero, is zero-inflated and is continuous. Values range from 0 to a little over 0.25. As a consequence, there is quite…
David
  • 481
48
votes
3 answers

How are Random Forests not sensitive to outliers?

I've read in a few sources, including this one, that Random Forests are not sensitive to outliers (in the way that Logistic Regression and other ML methods are, for example). However, two pieces of intuition tell me otherwise: Whenever a decision…
makansij
  • 2,279
  • 9
  • 31
  • 42
48
votes
7 answers

Where to start with statistics for an experienced developer

During the first half of 2015 I did the coursera course of Machine Learning (by Andrew Ng, GREAT course). And learned the basics of machine learning (linear regression, logistic regression, SVM, Neuronal Networks...) Also I have been a developer for…
48
votes
6 answers

How do I avoid overlapping labels in an R plot?

I'm trying to label a pretty simple scatterplot in R. This is what I use: plot(SI, TI) text(SI, TI, Name, pos=4, cex=0.7) The result is mediocre, as you can see (click to enlarge): I tried to compensate for this using the textxy function, but it's…
slhck
  • 837
48
votes
7 answers

Combining probabilities/information from different sources

Lets say I have three independent sources and each of them make predictions for the weather tomorrow. The first one says that the probability of rain tomorrow is 0, then the second one says that the probability is 1, and finally the last one says…
Biela Diela
  • 601
  • 1
  • 6
  • 5
48
votes
4 answers

How to calculate a confidence level for a Poisson distribution?

Would like to know how confident I can be in my $\lambda$. Anyone know of a way to set upper and lower confidence levels for a Poisson distribution? Observations ($n$) = 88 Sample mean ($\lambda$) = 47.18182 what would the 95% confidence look…
Travis
  • 771
48
votes
7 answers

Why shouldn't the denominator of the covariance estimator be n-2 rather than n-1?

The denominator of the (unbiased) variance estimator is $n-1$ as there are $n$ observations and only one parameter is being estimated. $$ \mathbb{V}\left(X\right)=\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}}{n-1} $$ By the same token I…
MYaseen208
  • 2,719
48
votes
1 answer

Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom

Background Suppose we have an Ordinary Least Squares model where we have $k$ coefficients in our regression model, $$\mathbf{y}=\mathbf{X}\mathbf{\beta} + \mathbf{\epsilon}$$ where $\mathbf{\beta}$ is an $(k\times1)$ vector of coefficients,…
Garrett
  • 661
47
votes
4 answers

Functions of Independent Random Variables

Is the claim that functions of independent random variables are themselves independent, true? I have seen that result often used implicitly in some proofs, for example in the proof of independence between the sample mean and the sample variance of…
JohnK
  • 20,366
47
votes
4 answers

What is the difference between finite and infinite variance

What is the difference between finite and infinite variance ? My stats knowledge is rather basic; Wikipedia / Google wasn't much help here.
47
votes
2 answers

Why is logistic regression a linear model?

I want to know why logistic regression is called a linear model. It uses a sigmoid function, which is not linear. So why is logistic regression a linear model?
user34790
  • 6,757
  • 10
  • 46
  • 69
47
votes
4 answers

McFadden's Pseudo-$R^2$ Interpretation

I have a binary logistic regression model with a McFadden's pseudo R-squared of 0.192 with a dependent variable called payment (1 = payment and 0 = no payment). What is the interpretation of this pseudo R-squared? Is it a relative comparison for…
47
votes
8 answers

Rigorous definition of an outlier?

People often talk about dealing with outliers in statistics. The thing that bothers me about this is that, as far as I can tell, the definition of an outlier is completely subjective. For example, if the true distribution of some random variable…
dsimcha
  • 8,739
47
votes
1 answer

Neural Networks: weight change momentum and weight decay

Momentum $\alpha$ is used to diminish the fluctuations in weight changes over consecutive iterations: $$\Delta\omega_i(t+1) = - \eta\frac{\partial E}{\partial w_i} + \alpha \Delta \omega_i(t),$$ where $E({\bf w})$ is the error function, ${\bf w}$ -…
Oleg Shirokikh
  • 895
  • 1
  • 9
  • 18
47
votes
3 answers

whether to rescale indicator / binary / dummy predictors for LASSO

For the LASSO (and other model selecting procedures) it is crucial to rescale the predictors. The general recommendation I follow is simply to use a 0 mean, 1 standard deviation normalization for continuous variables. But what is there to do with…
László
  • 987