Most Popular
1500 questions
46
votes
10 answers
What exactly is Big Data?
I have been asked on several occasions the question:
What is Big-Data?
Both by students and my relatives that are picking up the buzz around statistics and ML.
I found this CV-post. And I feel that I agree with the only answer there.
The…
Gumeo
- 3,711
46
votes
3 answers
How is Naive Bayes a Linear Classifier?
I've seen the other thread here but I don't think the answer satisfied the actual question. What I have continually read is that Naive Bayes is a linear classifier (ex: here) (such that it draws a linear decision boundary) using the log odds…
Kevin Pei
- 869
46
votes
4 answers
Are smaller p-values more convincing?
I've been reading up on $p$-values, type 1 error rates, significance levels, power calculations, effect sizes and the Fisher vs Neyman-Pearson debate. This has left me feeling a bit overwhelmed. I apologise for the wall of text, but I felt it was…
Zenit
- 1,846
46
votes
6 answers
Why does increasing the sample size lower the (sampling) variance?
Big picture:
I'm trying to understand how increasing the sample size increases the power of an experiment.
My lecturer's slides explain this with a picture of 2 normal distributions, one for the null-hypothesis and one for the alternative-hypothesis…
user2740
- 1,376
- 2
- 13
- 20
46
votes
1 answer
what does the numbers in the classification report of sklearn mean?
I have below an example I pulled from sklearn 's sklearn.metrics.classification_report documentation.
What I don't understand is why there are f1-score, precision and recall values for each class where I believe class is the predictor label? I…
jxn
- 819
46
votes
3 answers
R - Confused on Residual Terminology
Root mean square error
residual sum of squares
residual standard error
mean squared error
test error
I thought I used to understand these terms but the more I do statistic problems the more I have gotten myself confused where I second guess…
user3788557
- 1,629
46
votes
22 answers
Are there any good movies involving mathematics or probability?
Can you suggest some good movies which involve math, probabilities etc? One example is 21. I would also be interested in movies that involve algorithms (e.g. text decryption). In general "geeky" movies with famous scientific theories but no science…
Siato
- 101
45
votes
4 answers
Ridge, lasso and elastic net
How do ridge, LASSO and elasticnet regularization methods compare? What are their respective advantages and disadvantages? Any good technical paper, or lecture notes would be appreciated as well.
user3269
- 5,152
- 10
- 46
- 55
45
votes
3 answers
Softmax layer in a neural network
I'm trying to add a softmax layer to a neural network trained with backpropagation, so I'm trying to compute its gradient.
The softmax output is $h_j = \frac{e^{z_j}}{\sum{e^{z_i}}}$ where $j$ is the output neuron number.
If I derive it then I…
Ran
- 1,626
45
votes
1 answer
When and how to use standardized explanatory variables in linear regression
I have 2 simple questions about linear regression:
When is it advised to standardize the explanatory variables?
Once estimation is carried out with standardized values, how can one predict with new values (how one should standardize the new…
teucer
- 2,051
45
votes
4 answers
Should covariates that are not statistically significant be 'kept in' when creating a model?
I have several covariates in my calculation for a model, and not all of them are statistically significant. Should I remove those that are not?
This question discusses the phenomenon, but does not answer my question:
How to interpret…
A.M.
- 689
45
votes
5 answers
Analysis with complex data, anything different?
Say for example you are doing a linear model, but the data $y$ is complex.
$ y = x \beta + \epsilon $
My data set is complex, as in all the numbers in $y$ are of the form $(a + bi)$. Is there anything procedurally different when working with such…
bill_e
- 2,831
- 2
- 23
- 33
45
votes
2 answers
How to interpret the output of the summary method for an lm object in R?
I am using sample algae data to understand data mining a bit more. I have used the following commands:
data(algae)
algae <- algae[-manyNAs(algae),]
clean.algae <-knnImputation(algae, k = 10)
lm.a1 <- lm(a1 ~ ., data = clean.algae[,…
godzilla
- 603
- 2
- 7
- 8
45
votes
5 answers
Using LASSO from lars (or glmnet) package in R for variable selection
Sorry if this question comes across a little basic.
I am looking to use LASSO variable selection for a multiple linear regression model in R. I have 15 predictors, one of which is categorical(will that cause a problem?). After setting my $x$ and $y$…
James
- 451
45
votes
2 answers
Mean absolute percentage error (MAPE) in Scikit-learn
How can we calculate the Mean absolute percentage error (MAPE) of our predictions using Python and scikit-learn?
From the docs, we have only these 4 metric functions for Regressions:
metrics.explained_variance_score(y_true,…
Nyxynyx
- 995