Highest Voted Questions - Statistical Analysis Stack Exchange

39

votes

5 answers

Will the mean of a set of means always be the same as the mean obtained from the entire set of raw data?

If I have calculated the mean for 4 data sets (which do have different sample sizes), can I then obtain an "overall mean" by calculating the "mean of the means"? If yes, will this "mean of the means" be the same as if I had combined the data from…

asked Jan 12 '15 at 18:01

user66429

391

39

votes

2 answers

How to use both binary and continuous variables together in clustering?

I need to use binary variables (values 0 & 1) in k-means. But k-means only works with continuous variables. I know some people still use these binary variables in k-means ignoring the fact that k-means is only designed for continuous variables. This…

asked Jan 02 '15 at 14:55

GeorgeOfTheRF

5,593

39

votes

7 answers

Are all simulation methods some form of Monte Carlo?

Is there a simulation method that is not Monte Carlo? All simulation methods involve substituting random numbers into the function to find a range of values for the function. So are all simulation methods in essence Monte Carlo methods?

monte-carlo

asked Dec 06 '14 at 15:20

Victor

6,565

39

votes

5 answers

Are decision trees almost always binary trees?

Nearly every decision tree example I've come across happens to be a binary tree. Is this pretty much universal? Do most of the standard algorithms (C4.5, CART, etc.) only support binary trees? From what I gather, CHAID is not limited to binary…

asked Jun 21 '11 at 21:29

Michael McGowan

4,761

39

votes

2 answers

Why use stratified cross validation? Why does this not damage variance related benefit?

I've been told that is beneficial to use stratified cross validation especially when response classes are unbalanced. If one purpose of cross-validation is to help account for the randomness of our original training data sample, surely making each…

asked Oct 02 '14 at 16:45

James Owers

647

39

votes

9 answers

What is the relationship between $Y$ and $X$ in this plot?

What is the relationship between $Y$ and $X$ in the following plot? In my view there is negative linear relationship, But because we have a lot of outliers, the relationship is very weak. Am I right? I want to learn how can we explain…

asked Sep 07 '14 at 15:20

PSS

843

39

votes

1 answer

Covariance of a random vector after a linear transformation

If $\mathbf {Z}$ is random vector and $A$ is a fixed matrix, could someone explain why $$\mathrm{cov}[A \mathbf {Z}]= A \mathrm{cov}[\mathbf {Z}]A^\top.$$

covariance

asked Aug 29 '14 at 13:56

user92612

745

39

votes

5 answers

What is the reason the log transformation is used with right-skewed distributions?

I once heard that log transformation is the most popular one for right-skewed distributions in linear regression or quantile regression I would like to know is there any reason underlying this statement? Why is the log transformation suitable for…

asked Jul 11 '14 at 14:50

user3269

5,152
10
46
55

39

votes

4 answers

What is the meaning of the "." (dot) in R?

I'm just reading the book "R in a Nutshell". And it seems as if I skipped the part where the "." as in "sample.formula" was explained. > sample.formula <- as.formula(y~x1+x2) Is sample an object with a field formula as in other languages? And if…

r

asked May 12 '11 at 14:11

Fabian

1,501

39

votes

4 answers

What is theta in a negative binomial regression fitted with R?

I've got a question concerning a negative binomial regression: Suppose that you have the following commands: require(MASS) attach(cars) mod.NB<-glm.nb(dist~speed) summary(mod.NB) detach(cars) (Note that cars is a dataset which is available in R,…

asked May 06 '11 at 16:32

MarkDollar

5,955

39

votes

2 answers

How do I know which method of cross validation is best?

I am trying to figure out which cross validation method is best for my situation. The following data are just an example for working through the issue (in R), but my real X data (xmat) are correlated with each other and correlated to different…

asked Jun 15 '14 at 15:25

rdorlearn

3,653

39

votes

5 answers

Difference between Bayesian networks and Markov process?

What is the difference between a Bayesian Network and a Markov process? I believed I understood the principles of both, but now when I need to compare the two I feel lost. They mean almost the same to me. Surely they are not. Links to other…

asked May 26 '14 at 02:46

rockstar

491
1
5
3

38

votes

4 answers

Can anyone clarify the concept of a "sum of random variables"

In my probability class the terms "sums of random variables" is constantly used. However, I'm stuck on what exactly that means? Are we talking about the sum of a bunch of realizations from a random variable? If so, doesn't that add up to a single…

asked May 01 '14 at 19:18

Gosset

545

38

votes

2 answers

Why is the Expectation Maximization algorithm guaranteed to converge to a local optimum?

I have read a couple of explanations of EM algorithm (e.g. from Bishop's Pattern Recognition and Machine Learning and from Roger and Gerolami First Course on Machine Learning). The derivation of EM is ok, I understand it. I also understand why the…

asked Jan 26 '14 at 14:09

michal

1,288

38

votes

1 answer

Does Cox Regression have an underlying Poisson distribution?

Our small team was having a discussion and got stuck. Does anyone know whether Cox regression has an underlying Poisson distribution. We had a debate that maybe Cox regression with constant time at risk will have similarities with Poisson regression…

asked Mar 10 '11 at 09:17

Julie

563

Most Popular