Highest Voted Questions - Statistical Analysis Stack Exchange

95

votes

4 answers

What is the difference between a "link function" and a "canonical link function" for GLM

What's the difference between terms 'link function' and 'canonical link function'? Also, are there any (theoretical) advantages of using one over the other? For example, a binary response variable can be modeled using many link functions such as…

asked Oct 21 '12 at 14:17

steadyfish

1,922

95

votes

7 answers

Explain the difference between multiple regression and multivariate regression, with minimal use of symbols/math

Are multiple and multivariate regression really different? What is a variate anyways?

asked Sep 03 '10 at 18:54

Neil McGuigan

9,872

95

votes

5 answers

Understanding the role of the discount factor in reinforcement learning

I'm teaching myself about reinforcement learning, and trying to understand the concept of discounted reward. So the reward is necessary to tell the system which state-action pairs are good, and which are bad. But what I don't understand is why the…

asked Jun 30 '16 at 07:53

Karnivaurus

7,019

95

votes

6 answers

Essential data checking tests

In my job role I often work with other people's datasets, non-experts bring me clinical data and I help them to summarise it and perform statistical tests. The problem I am having is that the datasets I am brought are almost always riddled with…

asked Jun 07 '11 at 08:19

Chris Beeley

5,761

94

votes

7 answers

How to generate uniformly distributed points on the surface of the 3-d unit sphere?

I am wondering how to generate uniformly distributed points on the surface of the 3-d unit sphere? Also after generating those points, what is the best way to visualize and check whether they are truly uniform on the surface $x^2+y^2+z^2=1$?

random-generation

asked Mar 07 '11 at 22:57

Qiang Li

1,295

94

votes

6 answers

Feature selection for "final" model when performing cross-validation in machine learning

I am getting a bit confused about feature selection and machine learning and I was wondering if you could help me out. I have a microarray dataset that is classified into two groups and has 1000s of features. My aim is to get a small number of…

asked Sep 02 '10 at 10:25

danielsbrewer

2,495

94

votes

3 answers

Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity?

Ridge regression coefficient estimate $\hat{\beta}^R$ are the values that minimize the $$ \text{RSS} + \lambda \sum_{j=1}^p\beta_j^2. $$ My questions are: If $\lambda = 0$, then we see that the expression above reduces to the usual RSS. What if…

asked May 07 '15 at 18:54

cgo

9,107

94

votes

5 answers

What do the residuals in a logistic regression mean?

In answering this question John Christie suggested that the fit of logistic regression models should be assessed by evaluating the residuals. I'm familiar with how to interpret residuals in OLS, they are in the same scale as the DV and very clearly…

asked Aug 09 '10 at 07:32

russellpierce

18,599

94

votes

4 answers

Can bootstrap be seen as a "cure" for the small sample size?

This question has been triggered by something I read in this graduate-level statistics textbook and also (independently) heard during this presentation at a statistical seminar. In both cases, the statement was along the lines of "because the sample…

asked Aug 16 '14 at 20:23

James

2,870

93

votes

7 answers

What are principal component scores?

What are principal component scores (PC scores, PCA scores)?

asked Jul 20 '10 at 05:37

vrish88

1,213

93

votes

2 answers

Resampling / simulation methods: monte carlo, bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests

I am trying to understand difference between different resampling methods (Monte Carlo simulation, parametric bootstrapping, non-parametric bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests) and their…

asked Jun 19 '14 at 17:59

Ram Sharma

2,436

92

votes

7 answers

Euclidean distance is usually not good for sparse data (and more general case)?

I have seen somewhere that classical distances (like Euclidean distance) become weakly discriminant when we have multidimensional and sparse data. Why? Do you have an example of two sparse data vectors where the Euclidean distance does not perform…

asked Jun 01 '12 at 13:55

shn

2,959

92

votes

7 answers

How to efficiently manage a statistical analysis project?

We often hear of project management and design patterns in computer science, but less frequently in statistical analysis. However, it seems that a decisive step toward designing an effective and durable statistical project is to keep things…

project-management

asked Sep 20 '10 at 20:39

chl

53,725

92

votes

4 answers

Why not approach classification through regression?

Some material I've seen on machine learning said that it's a bad idea to approach a classification problem through regression. But I think it's always possible to do a continuous regression to fit the data and truncate the continuous prediction to…

asked Feb 05 '12 at 05:43

Strin

1,021

91

votes

10 answers

Why is it possible to get significant F statistic (p<.001) but non-significant regressor t-tests?

In a multiple linear regression, why is it possible to have a highly significant F statistic (p<.001) but have very high p-values on all the regressor's t tests? In my model, there are 10 regressors. One has a p-value of 0.1 and the rest are above…

asked Oct 13 '10 at 09:40

Ηλίας

1,569

Most Popular