Most Popular
1500 questions
81
votes
9 answers
What algorithm should I use to detect anomalies on time-series?
Background
I'm working in Network Operations Center, we monitor computer systems and their performance. One of the key metrics to monitor is a number of visitors\customers currently connected to our servers. To make it visible we (Ops team) collect…
Ilya Khadykin
- 911
81
votes
0 answers
How can a regression be significant yet all predictors be non-significant?
My multiple regression analysis model has a statistically significant F value however all beta values are statistically non-significant.
All the regression assumptions are met. No multicollinearity was found. Correlations among all predictors are…
Serene
- 811
81
votes
5 answers
Unified view on shrinkage: what is the relation (if any) between Stein's paradox, ridge regression, and random effects in mixed models?
Consider the following three phenomena.
Stein's paradox: given some data from multivariate normal distribution in $\mathbb R^n, \: n\ge 3$, sample mean is not a very good estimator of the true mean. One can obtain an estimation with lower mean…
amoeba
- 104,745
81
votes
2 answers
Multivariate multiple regression in R
I have 2 dependent variables (DVs) each of whose score may be influenced by the set of 7 independent variables (IVs). DVs are continuous, while the set of IVs consists of a mix of continuous and binary coded variables. (In code below continuous…
Andrej
- 2,213
- 2
- 22
- 27
81
votes
1 answer
Understanding ROC curve
I'm having trouble understanding the ROC curve.
Is there any advantage / improvement in area under the ROC curve if I build different models from each unique subset of the training set and use it to produce a probability?
For example, if $y$ has…
Tay Shin
- 1,015
80
votes
2 answers
Performance metrics to evaluate unsupervised learning
With respect to the unsupervised learning (like clustering), are there any metrics to evaluate performance?
user3125
- 3,027
80
votes
4 answers
How should tiny $p$-values be reported? (and why does R put a minimum on 2.22e-16?)
For some tests in R, there is a lower limit on the p-value calculations of $2.22 \cdot 10^{-16}$. I'm not sure why it's this number, if there is a good reason for it or if it's just arbitrary. A lot of other stats packages just go to 0.0001, so this…
paul
- 1,402
80
votes
4 answers
Why do transformers use layer norm instead of batch norm?
Both batch norm and layer norm are common normalization techniques for neural network training.
I am wondering why transformers primarily use layer norm.
SantoshGupta7
- 1,139
80
votes
7 answers
Do not vote, one vote will not reverse election results. What is wrong with this reasoning?
Do not vote, one vote will not reverse the election result. What's
more, the probability of injury in a traffic collision on the way to the
ballot box is much higher than your vote reversing the election
result. What is even more, the…
Przemyslaw Remin
- 1,188
80
votes
3 answers
Diagnostics for logistic regression?
For linear regression, we can check the diagnostic plots (residuals plots, Normal QQ plots, etc) to check if the assumptions of linear regression are violated.
For logistic regression, I am having trouble finding resources that explain how to…
ialm
- 1,827
80
votes
8 answers
Is there a name for the phenomenon of false positives counterintuitively outstripping true positives
It seems very counter intuitive to many people that a given diagnostic test with very high accuracy (say 99%) can generate massively more false positives than true positives in some situations, namely where the population of true positives is very…
Roger Heathcote
- 903
80
votes
4 answers
Why does including latitude and longitude in a GAM account for spatial autocorrelation?
I have produced generalized additive models for deforestation. To account for spatial-autocorrelation, I have included latitude and longitude as a smoothed, interaction term (i.e. s(x,y)).
I've based this on reading many papers where the authors say…
gisol
- 1,003
80
votes
4 answers
F1/Dice-Score vs IoU
I was confused about the differences between the F1 score, Dice score and IoU (intersection over union). By now I found out that F1 and Dice mean the same thing (right?) and IoU has a very similar formula to the other two.
F1 / Dice:…
pietz
- 903
- 1
- 7
- 6
80
votes
3 answers
One-hot vs dummy encoding in Scikit-learn
There are two different ways to encoding categorical variables. Say, one categorical variable has n values. One-hot encoding converts it into n variables, while dummy encoding converts it into n-1 variables. If we have k categorical variables, each…
Munichong
- 2,095
80
votes
3 answers
How can an artificial neural network ANN, be used for unsupervised clustering?
I understand how an artificial neural network (ANN), can be trained in a supervised manner using backpropogation to improve the fitting by decreasing the error in the predictions. I have heard that an ANN can be used for unsupervised learning but…
Vass
- 1,675