Highest Voted Questions - Statistical Analysis Stack Exchange

46

votes

10 answers

What exactly is Big Data?

I have been asked on several occasions the question: What is Big-Data? Both by students and my relatives that are picking up the buzz around statistics and ML. I found this CV-post. And I feel that I agree with the only answer there. The…

large-data

asked Sep 18 '15 at 12:10

Gumeo

3,711

46

votes

3 answers

How is Naive Bayes a Linear Classifier?

I've seen the other thread here but I don't think the answer satisfied the actual question. What I have continually read is that Naive Bayes is a linear classifier (ex: here) (such that it draws a linear decision boundary) using the log odds…

asked Mar 17 '15 at 22:52

Kevin Pei

869

46

votes

4 answers

Are smaller p-values more convincing?

I've been reading up on $p$-values, type 1 error rates, significance levels, power calculations, effect sizes and the Fisher vs Neyman-Pearson debate. This has left me feeling a bit overwhelmed. I apologise for the wall of text, but I felt it was…

asked Feb 14 '15 at 18:35

Zenit

1,846

46

votes

6 answers

Why does increasing the sample size lower the (sampling) variance?

Big picture: I'm trying to understand how increasing the sample size increases the power of an experiment. My lecturer's slides explain this with a picture of 2 normal distributions, one for the null-hypothesis and one for the alternative-hypothesis…

asked Dec 21 '14 at 00:01

user2740

1,376
2
13
20

46

votes

1 answer

what does the numbers in the classification report of sklearn mean?

I have below an example I pulled from sklearn 's sklearn.metrics.classification_report documentation. What I don't understand is why there are f1-score, precision and recall values for each class where I believe class is the predictor label? I…

asked Oct 02 '14 at 18:26

jxn

819

46

votes

3 answers

R - Confused on Residual Terminology

Root mean square error residual sum of squares residual standard error mean squared error test error I thought I used to understand these terms but the more I do statistic problems the more I have gotten myself confused where I second guess…

asked Aug 07 '14 at 05:57

user3788557

1,629

46

votes

22 answers

Are there any good movies involving mathematics or probability?

Can you suggest some good movies which involve math, probabilities etc? One example is 21. I would also be interested in movies that involve algorithms (e.g. text decryption). In general "geeky" movies with famous scientific theories but no science…

asked May 07 '11 at 11:13

Siato

101

45

votes

4 answers

Ridge, lasso and elastic net

How do ridge, LASSO and elasticnet regularization methods compare? What are their respective advantages and disadvantages? Any good technical paper, or lecture notes would be appreciated as well.

asked Apr 09 '14 at 14:40

user3269

5,152
10
46
55

45

votes

3 answers

Softmax layer in a neural network

I'm trying to add a softmax layer to a neural network trained with backpropagation, so I'm trying to compute its gradient. The softmax output is $h_j = \frac{e^{z_j}}{\sum{e^{z_i}}}$ where $j$ is the output neuron number. If I derive it then I…

neural-networks

asked Dec 12 '13 at 12:57

Ran

1,626

45

votes

1 answer

When and how to use standardized explanatory variables in linear regression

I have 2 simple questions about linear regression: When is it advised to standardize the explanatory variables? Once estimation is carried out with standardized values, how can one predict with new values (how one should standardize the new…

asked Feb 11 '11 at 23:09

teucer

2,051

45

votes

4 answers

Should covariates that are not statistically significant be 'kept in' when creating a model?

I have several covariates in my calculation for a model, and not all of them are statistically significant. Should I remove those that are not? This question discusses the phenomenon, but does not answer my question: How to interpret…

asked Aug 03 '13 at 18:05

A.M.

689

45

votes

5 answers

Analysis with complex data, anything different?

Say for example you are doing a linear model, but the data $y$ is complex. $ y = x \beta + \epsilon $ My data set is complex, as in all the numbers in $y$ are of the form $(a + bi)$. Is there anything procedurally different when working with such…

asked Jul 31 '13 at 06:50

bill_e

2,831
2
23
33

45

votes

2 answers

How to interpret the output of the summary method for an lm object in R?

I am using sample algae data to understand data mining a bit more. I have used the following commands: data(algae) algae <- algae[-manyNAs(algae),] clean.algae <-knnImputation(algae, k = 10) lm.a1 <- lm(a1 ~ ., data = clean.algae[,…

asked May 17 '13 at 00:02

godzilla

603
2
7
8

45

votes

5 answers

Using LASSO from lars (or glmnet) package in R for variable selection

Sorry if this question comes across a little basic. I am looking to use LASSO variable selection for a multiple linear regression model in R. I have 15 predictors, one of which is categorical(will that cause a problem?). After setting my $x$ and $y$…

asked May 08 '13 at 23:57

James

451

45

votes

2 answers

Mean absolute percentage error (MAPE) in Scikit-learn

How can we calculate the Mean absolute percentage error (MAPE) of our predictions using Python and scikit-learn? From the docs, we have only these 4 metric functions for Regressions: metrics.explained_variance_score(y_true,…

asked May 07 '13 at 16:52

Nyxynyx

995

Most Popular