Most Popular

1500 questions
47
votes
5 answers

Optimized implementations of the Random Forest algorithm

I have noticed that there are a few implementations of random forest such as ALGLIB, Waffles and some R packages like randomForest. Can anybody tell me whether these libraries are highly optimized? Are they basically equivalent to the random…
Henry B.
  • 1,629
46
votes
5 answers

Using R online - without installing it

Is there a possibility to use R in a webinterface without the need to install it? I have only one small script which I like to run but I just want to give it a shot without a long installation procedure. Thank you.
vonjd
  • 6,146
46
votes
5 answers

Statistics published in academic papers

I read a lot of evolutionary/ecological academic papers, sometimes with the specific aim of seeing how statistics are being used 'in the real world' outside of the textbook. I normally take the statistics in papers as gospel and use the papers to…
luciano
  • 14,269
46
votes
4 answers

OpenBugs vs. JAGS

I am about to try out a BUGS style environment for estimating Bayesian models. Are there any important advantages to consider in choosing between OpenBugs or JAGS? Is one likely to replace the other in the foreseeable future? I will be using the…
DanB
  • 958
46
votes
6 answers

How does cross-validation overcome the overfitting problem?

Why does a cross-validation procedure overcome the problem of overfitting a model?
user3269
  • 5,152
  • 10
  • 46
  • 55
46
votes
3 answers

What is meant by 'weak learner'?

Can anyone tell me what is meant by the phrase 'weak learner'? Is it supposed to be a weak hypothesis? I am confused about the relationship between a weak learner and a weak classifier. Are both the same or is there some difference? In the adaboost…
vrushali
  • 461
46
votes
1 answer

What is the difference between conditional and unconditional quantile regression?

The conditional quantile regression estimator by Koenker and Basset (1978) for the $\tau^{th}$ quantile is defined as $$ \widehat{\beta}_{QR} = \min_{b} \sum^{n}_{i=1} \rho_\tau (y_i - X'_i b_\tau) $$ where $\rho_\tau = u_i\cdot (\tau - 1(u_i<0))$…
AlexH
  • 966
46
votes
4 answers

Bound for the correlation of three random variables

There are three random variables, $x,y,z$. The three correlations between the three variables are the same. That is, $$\rho=\textrm{cor}(x,y)=\textrm{cor}(x,z)=\textrm{cor}(y,z)$$ What is the tightest bound you can give for $\rho$?
46
votes
5 answers

Does the beta distribution have a conjugate prior?

I know that the beta distribution is conjugate to the binomial. But what is the conjugate prior of the beta? Thank you.
46
votes
2 answers

Understanding the parameters inside the Negative Binomial Distribution

I was trying to fit my data into various models and figured out that the fitdistr function from library MASS of R gives me Negative Binomial as the best-fit. Now from the wiki page, the definition is given as: NegBin(r,p) distribution describes the…
Legend
  • 4,532
46
votes
3 answers

Why is a likelihood-ratio test distributed chi-squared?

Why is the test statistic of a likelihood ratio test distributed chi-squared? $2(\ln \text{ L}_{\rm alt\ model} - \ln \text{ L}_{\rm null\ model} ) \sim \chi^{2}_{df_{\rm alt}-df_{\rm null}}$
Dr. Beeblebrox
  • 1,302
  • 1
  • 13
  • 18
46
votes
5 answers

K-fold vs. Monte Carlo cross-validation

I am trying to learn various cross validation methods, primarily with intention to apply to supervised multivariate analysis techniques. Two I have come across are K-fold and Monte Carlo cross-validation techniques. I have read that K-fold is a…
Liam
  • 603
46
votes
6 answers

Are neural networks better than SVMs?

For some time now I have been studying both support vector machines and neural networks and I understand the logic behind each of these techniques. Very briefly described: In a support vector machine, using the kernel-trick, you "send" the data…
46
votes
7 answers

What is the minimum recommended number of groups for a random effects factor?

I'm using a mixed model in R (lme4) to analyze some repeated measures data. I have a response variable (fiber content of feces) and 3 fixed effects (body mass, etc.). My study only has 6 participants, with 16 repeated measures for each one (though…
Chris
  • 929
46
votes
3 answers

Why does minimizing the MAE lead to forecasting the median and not the mean?

From the Forecasting: Principles and Practice textbook by Rob J Hyndman and George Athanasopoulos, specifically the section on accuracy measurement: A forecast method that minimizes the MAE will lead to forecasts of the median, while minimizing…
Brans Ds
  • 1,478