Highest Voted Questions - Statistical Analysis Stack Exchange

38

votes

1 answer

Benefits of stratified vs random sampling for generating training data in classification

I would like to know if there are any/some advantages of using stratified sampling instead of random sampling, when splitting the original dataset into training and testing set for classification. Also, does stratified sampling introduce more bias…

asked Dec 07 '16 at 21:24

gc5

1,227

38

votes

3 answers

What algorithms need feature scaling, beside from SVM?

I am working with many algorithms: RandomForest, DecisionTrees, NaiveBayes, SVM (kernel=linear and rbf), KNN, LDA and XGBoost. All of them were pretty fast except for SVM. That is when I got to know that it needs feature scaling to work faster. Then…

asked Nov 06 '16 at 15:09

Aizzaac

1,179

38

votes

3 answers

How to interpret root mean squared error (RMSE) vs standard deviation?

Let's say I have a model that gives me projected values. I calculate RMSE of those values. And then the standard deviation of the actual values. Does it make any sense to compare those two values (variances)? What I think is, if RMSE and…

asked Oct 27 '16 at 16:20

jkim19

481

38

votes

7 answers

Are there algorithms for computing "running" linear or logistic regression parameters?

A paper "Accurately computing running variance" at http://www.johndcook.com/standard_deviation.html shows how to compute running mean, variance and standard deviations. Are there algorithms where the parameters of a linear or logistic regression…

asked Feb 09 '12 at 07:49

adrcuth

38

votes

5 answers

Estimating same model over multiple time series

I have a novice background in time series (some ARIMA estimation/forecasting) and am facing a problem I don't fully understand. Any help would be greatly appreciated. I am analyzing multiple time series, all over the same time interval and all of…

time-series

asked Feb 17 '12 at 13:33

sparc_spread

815

38

votes

3 answers

How to draw neat polygons around scatterplot regions in ggplot2

How do I add a neat polygon around a group of points on a scatterplot? I am using ggplot2 but am disappointed with the results of geom_polygon. The dataset is over there, as a tab-delimited text file. The graph below shows two measures of attitudes…

asked Feb 14 '12 at 17:07

Fr.

1,453

38

votes

5 answers

What are the dangers of violating the homoscedasticity assumption for linear regression?

As an example, consider the ChickWeight data set in R. The variance obviously grows over time, so if I use a simple linear regression like: m <- lm(weight ~ Time*Diet, data=ChickWeight) My questions: Which aspects of the model will be…

asked Feb 14 '12 at 15:50

Dan M.

940

38

votes

5 answers

A measure of "variance" from the covariance matrix?

If the data is 1d, the variance shows the extent to which the data points are different from each other. If the data is multi-dimensional, we'll get a covariance matrix. Is there a measure that gives a single number of how the data points are…

asked Jul 25 '16 at 02:37

dontloo

16,356

38

votes

5 answers

Interpreting negative cosine similarity

My question may be a silly one. So I shall apologize in advance. I was trying to use the GLOVE model pre-trained by Stanford NLP group (link). However, I noticed that my similarity results showed some negative numbers. That immediately prompted me…

asked Feb 26 '16 at 21:47

Patrick the Cat

606
1
6
15

38

votes

1 answer

What is the difference between generalized estimating equations and GLMM?

I'm running a GEE on 3-level unbalanced data, using a logit link. How does this differ (in terms of the conclusions I can draw and the meaning of the coefficients) from a GLM with mixed effects (GLMM) and logit link? More detail: The observations…

asked Oct 20 '11 at 18:52

user6666

38

votes

10 answers

What is your favorite layman's explanation for a difficult statistical concept?

I really enjoy hearing simple explanations to complex problems. What is your favorite analogy or anecdote that explains a difficult statistical concept? My favorite is Murray's explanation of cointegration using a drunkard and her dog. Murray…

asked Jul 19 '10 at 22:43

brotchie

701

38

votes

3 answers

How to tell the difference between linear and non-linear regression models?

I was reading the following link on non linear regression SAS Non Linear. My understanding from reading the first section "Nonlinear Regression vs. Linear Regression" was that the equation below is actually a linear regression, is that correct? If…

asked Apr 28 '15 at 08:07

mHelpMe

687

38

votes

2 answers

What's the difference between the variance and the mean squared error?

I'm surprised this hasn't been asked before, but I cannot find the question on stats.stackexchange. This is the formula to calculate the variance of a normally distributed sample: $$\frac{\sum(X - \bar{X}) ^2}{n-1}$$ This is the formula to calculate…

asked Mar 05 '15 at 19:27

luciano

14,269

38

votes

4 answers

What are the differences between sparse coding and autoencoder?

Sparse coding is defined as learning an over-complete set of basis vectors to represent input vectors (<-- why do we want this) . What are the differences between sparse coding and autoencoder? When will we use sparse coding and autoencoder?

asked Oct 07 '14 at 17:44

RockTheStar

12,907
34
71
96

38

votes

5 answers

How to visualize/understand what a neural network is doing?

Neural networks are often treated as "black boxes" due to their complex structure. This is not ideal, as it is often beneficial to have an intuitive grasp of how a model is working internally. What are methods of visualizing how a trained neural…

asked Jun 09 '11 at 17:19

rm999

758

Most Popular