Most Popular
1500 questions
41
votes
2 answers
What is the difference between censoring and truncation?
In the book Statistical Models and Methods for Lifetime Data , it is written :
Censoring: When an observation is incomplete due to some random cause.
Truncation: When the incomplete nature of the observation is due to a systematic selection process…
ABC
- 1,705
41
votes
2 answers
Why is lambda "within one standard error from the minimum" is a recommended value for lambda in an elastic net regression?
I understand what role lambda plays in an elastic-net regression. And I can understand why one would select lambda.min, the value of lambda that minimizes cross validated error.
My question is Where in the statistics literature is it recommended to…
jhersh
- 413
41
votes
3 answers
Clojure versus R: advantages and disadvantages for data analysis
I had a plan of learning R in the near future. Reading another question I found out about Clojure. Now I don't know what to do.
I think a big advantage of R for me is that some people in Economics use it, including one of my supervisors (though the…
Vivi
- 1,301
41
votes
5 answers
How to split dataset for time-series prediction?
I have historic sales data from a bakery (daily, over 3 years). Now I want to build a model to predict future sales (using features like weekday, weather variables, etc.).
How should I split the dataset for fitting and evaluating the models?
Does…
tobip
- 1,570
41
votes
4 answers
For plotting with R, should I learn ggplot2 or ggvis?
For plotting with R, should I learn ggplot2 or ggvis? I don't necessarily want to learn both if one of them is superior in any regard. Why R community keeps creating new packages with overlapping functionalities? The introduction blog post does…
qazwsx
- 737
41
votes
2 answers
Should we address multiple comparisons adjustments when using confidence intervals?
Suppose we have a multiple comparisons scenario such as post hoc inference on pairwise statistics, or like a multiple regression, where we are making a total of $m$ comparisons. Suppose also, that we would like to support inference in these…
Alexis
- 29,850
40
votes
7 answers
how to represent geography or zip code in machine learning model or recommender system?
I am building a model and I think that geographic location is likely to be very good at predicting my target variable. I have the zip code of each of my users. I am not entirely sure about the best way to include zip code as a predictor feature in…
captain_ahab
- 1,512
40
votes
5 answers
Cross-validating time-series analysis
I've been using the caret package in R to build predictive models for classification and regression. Caret provides a unified interface to tune model hyper-parameters by cross validation or boot strapping. For example, if you are building a simple…
Zach
- 23,766
40
votes
4 answers
Recall and precision in classification
I read some definitions of recall and precision, though it is every time in the context of information retrieval. I was wondering if someone could explain this a bit more in a classification context and maybe illustrate some examples. Say for…
Olivier_s_j
- 1,185
40
votes
2 answers
Variance of a function of one random variable
Lets say we have random variable $X$ with known variance and mean. The question is: what is the variance of $f(X)$ for some given function f. The only general method that I'm aware of is the delta method, but it gives only aproximation. Now I'm…
Tomek Tarczynski
- 4,024
40
votes
3 answers
How do I interpret the 'correlations of fixed effects' in my glmer output?
I have the following output:
Generalized linear mixed model fit by the Laplace approximation
Formula: aph.remain ~ sMFS2 +sAG2 +sSHDI2 +sbare +season +crop +(1|landscape)
AIC BIC logLik deviance
4062 4093 -2022 4044
Random…
susie
- 671
40
votes
4 answers
How do you Interpret RMSLE (Root Mean Squared Logarithmic Error)?
I've been doing a machine learning competition where they use RMSLE (Root Mean Squared Logarithmic Error) to evaluate the performance predicting the sale price of a category of equipment. The problem is I'm not sure how to interpret the success of…
Opus
- 401
40
votes
6 answers
Is there a name for the opposite of the gambler's fallacy?
The gambler's fallacy is a fallacy because of the assumed probability and the independence of the events. However, if, after flipping a coin 100 times and obtaining heads each time, I still believe the probability of obtaining tails to be 0.5, am I…
Igor F.
- 9,089
40
votes
3 answers
Propensity score matching - What is the problem?
In estimation of treatment effects a commonly used method is matching. There are of course several techniques used for matching but one of the more popular techniques is propensity-score matching.
However, I sometimes stumble upon contexts where it…
Jesper for President
- 5,520
- 2
- 21
- 45
40
votes
7 answers
Why is using squared error the standard when absolute error is more relevant to most problems?
I recognize that parts of this topic have been discussed on this forum. Some examples:
Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?
Why square the difference instead of…
Ryan Volpi
- 1,888