Most Popular

1500 questions
57
votes
6 answers

What are the main theorems in Machine (Deep) Learning?

Al Rahimi has recently given a very provocative talk in NIPS 2017 comparing current Machine Learning to Alchemy. One of his claims is that we need to get back to theoretical developments, to have simple theorems proving foundational results. When…
user188529
57
votes
6 answers

How to determine best cutoff point and its confidence interval using ROC curve in R?

I have the data of a test that could be used to distinguish normal and tumor cells. According to ROC curve it looks good for this purpose (area under curve is 0.9): My questions are: How to determine cutoff point for this test and its confidence…
57
votes
5 answers

Statistical inference when the sample "is" the population

Imagine you have to do reporting on the numbers of candidates who yearly take a given test. It seems rather difficult to infer the observed % of success, for instance, on a wider population due to the specifity of the target population. So you may…
pbneau
  • 1,251
57
votes
3 answers

Why would R return NA as a lm() coefficient?

I am fitting an lm() model to a data set that includes indicators for the financial quarter (Q1, Q2, Q3, making Q4 a default). Using lm(Y~., data = data) I get a NA as the coefficient for Q3, and a warning that one variable was exclude because of…
Fraijo
  • 1,078
57
votes
9 answers

How do R and Python complement each other in data science?

In many tutorials or manuals the narrative seems to imply that R and python coexist as complementary components of the analysis process. To my untrained eye, however, it seems that both languages sort of do the same thing. So my question is if there…
BioHazZzZard
  • 319
  • 1
  • 4
  • 5
57
votes
10 answers

What are some examples of anachronistic practices in statistics?

I am referring to practices that still maintain their presence, even though the problems (usually computational) they were designed to cope with have been mostly solved. For example, Yates' continuity correction was invented to approximate Fisher's…
Francis
  • 3,150
57
votes
5 answers

When is a biased estimator preferable to unbiased one?

It's obvious many times why one prefers an unbiased estimator. But, are there any circumstances under which we might actually prefer a biased estimator over an unbiased one?
57
votes
1 answer

How to determine whether or not the y-axis of a graph should start at zero?

One common way to "lie with data" is to use a y-axis scale that makes it seem as if changes are more significant than they really are. When I review scientific publications, or students' lab reports, I am often frustrated by this "data visualization…
ff524
  • 787
57
votes
2 answers

When will L1 regularization work better than L2 and vice versa?

Note: I know that L1 has feature selection property. I am trying to understand which one to choose when feature selection is completely irrelevant. How to decide which regularization (L1 or L2) to use? What are the pros & cons of each of L1 / L2…
57
votes
3 answers

Is there any difference between lm and glm for the gaussian family of glm?

Specifically, I want to know if there is a difference between lm(y ~ x1 + x2) and glm(y ~ x1 + x2, family=gaussian). I think that this particular case of glm is equal to lm. Am I wrong?
57
votes
7 answers

Why zero correlation does not necessarily imply independence

If two variables have 0 correlation, why are they not necessarily independent? Are zero correlated variables independent under special circumstances ? If possible, I am looking for an intuitive explanation, not a highly technical one.
Victor
  • 6,565
57
votes
13 answers

Software for drawing bayesian networks (graphical models)

I am searching for [free] software that can produce nice looking graphical models, e.g. Any suggestions would be appreciated.
C. Reed
  • 517
57
votes
4 answers

Would PCA work for boolean (binary) data types?

I want to reduce the dimensionality of higher order systems and capture most of the covariance on a preferably 2 dimensional or 1 dimensional field. I understand this can be done via principal component analysis, and I have used PCA in many…
57
votes
2 answers

Area under Precision-Recall Curve (AUC of PR-curve) and Average Precision (AP)

Is Average Precision (AP) the Area under Precision-Recall Curve (AUC of PR-curve) ? EDIT: here is some comment about difference in PR AUC and AP. The AUC is obtained by trapezoidal interpolation of the precision. An alternative and usually…
mrgloom
  • 2,207
56
votes
2 answers

Interpretation of R's output for binomial regression

I'm quite new on this with binomial data tests, but needed to do one and now I´m not sure how to interpret the outcome. The y-variable, the response variable, is binomial and the explanatory factors are continuous. This is what I got when…
user40116
  • 681
  • 1
  • 7
  • 4