Highest Voted Questions - Statistical Analysis Stack Exchange

64

votes

5 answers

How to calculate pseudo-$R^2$ from R's logistic regression?

Christopher Manning's writeup on logistic regression in R shows a logistic regression in R as follows: ced.logr <- glm(ced.del ~ cat + follows + factor(class), family=binomial) Some output: > summary(ced.logr) Call: glm(formula = ced.del ~ cat +…

asked Mar 19 '11 at 22:44

dfrankow

3,376

64

votes

13 answers

Mean absolute deviation vs. standard deviation

In the text book "New Comprehensive Mathematics for O Level" by Greer (1983), I see averaged deviation calculated like this: Sum up absolute differences between single values and the mean. Then get its average. Througout the chapter the term mean…

asked Jan 12 '14 at 09:53

itsols

809

64

votes

3 answers

Clustering with K-Means and EM: how are they related?

I have studied algorithms for clustering data (unsupervised learning): EM, and k-means. I keep reading the following : k-means is a variant of EM, with the assumptions that clusters are spherical. Can somebody explain the above sentence? I do…

asked Nov 18 '13 at 11:47

Myna

793
1
6
6

64

votes

3 answers

What is the difference between posterior and posterior predictive distribution?

I understand what a Posterior is, but I'm not sure what the latter means? How are the 2 different? Kevin P Murphy indicated in his textbook, Machine Learning: a Probabilistic Perspective, that it is "an internal belief state". What does that really…

asked Sep 25 '13 at 16:05

A.D

2,494

64

votes

2 answers

Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?

Computing power considerations aside, are there any reasons to believe that increasing the number of folds in cross-validation leads to better model selection/validation (i.e. that the higher the number of folds the better)? Taking the argument to…

asked Jun 12 '13 at 13:24

Amelio Vazquez-Reina

19,346

64

votes

7 answers

Which permutation test implementation in R to use instead of t-tests (paired and non-paired)?

I have data from an experiment that I analyzed using t-tests. The dependent variable is interval scaled and the data are either unpaired (i.e., 2 groups) or paired (i.e., within-subjects). E.g. (within subjects): x1 <- c(99, 99.5, 65, 100, 99,…

asked Jan 10 '11 at 12:10

Henrik

14,198
11
69
130

64

votes

4 answers

How are regression, the t-test, and the ANOVA all versions of the general linear model?

How are they all versions of the same basic statistical method?

asked May 15 '13 at 00:46

Amahabirsingh

731
1
6
5

64

votes

8 answers

Does it ever make sense to treat categorical data as continuous?

In answering this question on discrete and continuous data I glibly asserted that it rarely makes sense to treat categorical data as continuous. On the face of it that seems self-evident, but intuition is often a poor guide for statistics, or at…

asked Jul 23 '10 at 06:17

walkytalky

1,898

64

votes

47 answers

Most famous statisticians

What are the most important statisticians, and what is it that made them famous? (Reply just one scientist per answer please.)

asked Dec 04 '10 at 00:08

mariana soffer

1,101

64

votes

1 answer

Logistic regression in R resulted in perfect separation (Hauck-Donner phenomenon). Now what?

I'm trying to predict a binary outcome using 50 continuous explanatory variables (the range of most of the variables is $-\infty$ to $\infty$). My data set has almost 24,000 rows. When I run glm in R, I get: Warning messages: 1: glm.fit: algorithm…

asked Dec 12 '12 at 23:59

Dcook

773

64

votes

11 answers

Examples of Bayesian and frequentist approach giving different answers

Note: I am aware of philosophical differences between Bayesian and frequentist statistics. For example "what is the probability that the coin on the table is heads" doesn't make sense in frequentist statistics, since it has either already landed…

asked Nov 13 '12 at 09:13

user541686

1,185

64

votes

4 answers

Are all values within a 95% confidence interval equally likely?

I have found discordant information on the question: "If one constructs a 95% confidence interval (CI) of a difference in means or a difference in proportions, are all values within the CI equally likely? Or, is the point estimate the most likely,…

confidence-interval

asked Oct 19 '12 at 18:32

pmgjones

5,773
8
38
36

64

votes

8 answers

Are bayesians slaves of the likelihood function?

In his book "All of Statistics", Prof. Larry Wasserman presents the following Example (11.10, page 188). Suppose that we have a density $f$ such that $f(x)=c\,g(x)$, where $g$ is a known (nonnegative, integrable) function, and the normalization…

asked Oct 01 '12 at 21:01

Zen

24,121

64

votes

4 answers

Does the optimal number of trees in a random forest depend on the number of predictors?

Can someone explain why we need a large number of trees in random forest when the number of predictors is large? How can we determine the optimal number of trees?

asked Sep 12 '12 at 14:07

Z Khan

643
1
6
4

64

votes

13 answers

Two-tailed tests... I'm just not convinced. What's the point?

The following excerpt is from the entry, What are the differences between one-tailed and two-tailed tests?, on UCLA's statistics help site. ... consider the consequences of missing an effect in the other direction. Imagine you have developed a new…

asked May 23 '18 at 09:01

FromTheAshes

773

Most Popular