Highest Voted Questions - Statistical Analysis Stack Exchange

41

votes

2 answers

What is the difference between censoring and truncation?

In the book Statistical Models and Methods for Lifetime Data , it is written : Censoring: When an observation is incomplete due to some random cause. Truncation: When the incomplete nature of the observation is due to a systematic selection process…

asked Mar 30 '15 at 12:19

ABC

1,705

41

votes

2 answers

Why is lambda "within one standard error from the minimum" is a recommended value for lambda in an elastic net regression?

I understand what role lambda plays in an elastic-net regression. And I can understand why one would select lambda.min, the value of lambda that minimizes cross validated error. My question is Where in the statistics literature is it recommended to…

asked Feb 20 '15 at 20:56

jhersh

413

41

votes

3 answers

Clojure versus R: advantages and disadvantages for data analysis

I had a plan of learning R in the near future. Reading another question I found out about Clojure. Now I don't know what to do. I think a big advantage of R for me is that some people in Economics use it, including one of my supervisors (though the…

r

asked Jul 19 '10 at 21:26

Vivi

1,301

41

votes

5 answers

How to split dataset for time-series prediction?

I have historic sales data from a bakery (daily, over 3 years). Now I want to build a model to predict future sales (using features like weekday, weather variables, etc.). How should I split the dataset for fitting and evaluating the models? Does…

asked Sep 30 '14 at 16:23

tobip

1,570

41

votes

4 answers

For plotting with R, should I learn ggplot2 or ggvis?

For plotting with R, should I learn ggplot2 or ggvis? I don't necessarily want to learn both if one of them is superior in any regard. Why R community keeps creating new packages with overlapping functionalities? The introduction blog post does…

asked Sep 28 '14 at 18:08

qazwsx

737

41

votes

2 answers

Should we address multiple comparisons adjustments when using confidence intervals?

Suppose we have a multiple comparisons scenario such as post hoc inference on pairwise statistics, or like a multiple regression, where we are making a total of $m$ comparisons. Suppose also, that we would like to support inference in these…

asked Sep 09 '14 at 19:09

Alexis

29,850

40

votes

7 answers

how to represent geography or zip code in machine learning model or recommender system?

I am building a model and I think that geographic location is likely to be very good at predicting my target variable. I have the zip code of each of my users. I am not entirely sure about the best way to include zip code as a predictor feature in…

asked Apr 23 '14 at 18:10

captain_ahab

1,512

40

votes

5 answers

Cross-validating time-series analysis

I've been using the caret package in R to build predictive models for classification and regression. Caret provides a unified interface to tune model hyper-parameters by cross validation or boot strapping. For example, if you are building a simple…

asked Mar 26 '11 at 20:50

Zach

23,766

40

votes

4 answers

Recall and precision in classification

I read some definitions of recall and precision, though it is every time in the context of information retrieval. I was wondering if someone could explain this a bit more in a classification context and maybe illustrate some examples. Say for…

asked Jun 26 '13 at 09:22

Olivier_s_j

1,185

40

votes

2 answers

Variance of a function of one random variable

Lets say we have random variable $X$ with known variance and mean. The question is: what is the variance of $f(X)$ for some given function f. The only general method that I'm aware of is the delta method, but it gives only aproximation. Now I'm…

asked Dec 28 '10 at 14:13

Tomek Tarczynski

4,024

40

votes

3 answers

How do I interpret the 'correlations of fixed effects' in my glmer output?

I have the following output: Generalized linear mixed model fit by the Laplace approximation Formula: aph.remain ~ sMFS2 +sAG2 +sSHDI2 +sbare +season +crop +(1|landscape) AIC BIC logLik deviance 4062 4093 -2022 4044 Random…

asked Apr 25 '13 at 15:46

susie

671

40

votes

4 answers

How do you Interpret RMSLE (Root Mean Squared Logarithmic Error)?

I've been doing a machine learning competition where they use RMSLE (Root Mean Squared Logarithmic Error) to evaluate the performance predicting the sale price of a category of equipment. The problem is I'm not sure how to interpret the success of…

asked Apr 20 '13 at 04:39

Opus

401

40

votes

6 answers

Is there a name for the opposite of the gambler's fallacy?

The gambler's fallacy is a fallacy because of the assumed probability and the independence of the events. However, if, after flipping a coin 100 times and obtaining heads each time, I still believe the probability of obtaining tails to be 0.5, am I…

asked Apr 09 '21 at 10:00

Igor F.

9,089

40

votes

3 answers

Propensity score matching - What is the problem?

In estimation of treatment effects a commonly used method is matching. There are of course several techniques used for matching but one of the more popular techniques is propensity-score matching. However, I sometimes stumble upon contexts where it…

asked Aug 01 '20 at 17:04

Jesper for President

5,520
2
21
45

40

votes

7 answers

Why is using squared error the standard when absolute error is more relevant to most problems?

I recognize that parts of this topic have been discussed on this forum. Some examples: Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter? Why square the difference instead of…

asked Jun 05 '20 at 19:47

Ryan Volpi

1,888

Most Popular