Most Popular

1500 questions
40
votes
9 answers

Variance of a bounded random variable

Suppose that a random variable has a lower and an upper bound [0,1]. How to compute the variance of such a variable?
Piotr
  • 501
40
votes
3 answers

What's the relation between hierarchical models, neural networks, graphical models, bayesian networks?

They all seem to represent random variables by the nodes and (in)dependence via the (possibly directed) edges. I'm esp interested in a bayesian's point-of-view.
cespinoza
  • 802
40
votes
4 answers

Simple way to algorithmically identify a spike in recorded errors

We need an early warning system. I am dealing with a server that is known to have performance issues under load. Errors are recorded in a database along with a timestamp. There are some manual intervention steps that can be taken to decrease the…
dbenton
  • 503
40
votes
7 answers

Can cross validation be used for causal inference?

In all contexts I am familiar with cross-validation it is solely used with the goal of increasing predictive accuracy. Can the logic of cross validation be extended in estimating the unbiased relationships between variables? While this paper by…
Andy W
  • 16,026
40
votes
1 answer

What are easy to interpret, goodness of fit measures for linear mixed effects models?

I am currently using the R package lme4. I am using a linear mixed effects models with random effects: library(lme4) mod1 <- lmer(r1 ~ (1 | site), data = sample_set) #Only random effects mod2 <- lmer(r1 ~ p1 + (1 | site), data = sample_set) #One…
mjburns
  • 1,107
40
votes
7 answers

Why doesn't regularization solve Deep Neural Nets hunger for data?

An issue I've seen frequently brought up in the context of Neural Networks in general, and Deep Neural Networks in particular, is that they're "data hungry" - that is they don't perform well unless we have a large data set with which to train the…
Skander H.
  • 11,888
  • 2
  • 41
  • 97
40
votes
4 answers

X and Y are not correlated, but X is significant predictor of Y in multiple regression. What does it mean?

X and Y are not correlated (-.01); however, when I place X in a multiple regression predicting Y, alongside three (A, B, C) other (related) variables, X and two other variables (A, B) are significant predictors of Y. Note that the two other (A, B)…
Behacad
  • 5,064
  • 8
  • 35
  • 49
40
votes
1 answer

When to choose SARSA vs. Q Learning

SARSA and Q Learning are both reinforcement learning algorithms that work in a similar way. The most striking difference is that SARSA is on policy while Q Learning is off policy. The update rules are as follows: Q…
hh32
  • 1,421
40
votes
3 answers

What is the "capacity" of a machine learning model?

I'm studying this Tutorial on Variational Autoencoders by Carl Doersch. In the second page it states: One of the most popular such frameworks is the Variational Autoencoder [1, 3], the subject of this tutorial. The assumptions of this model are…
40
votes
6 answers

Sampling for Imbalanced Data in Regression

There have been good questions on handling imbalanced data in the classification context, but I am wondering what people do to sample for regression. Say the problem domain is very sensitive to the sign but only somewhat sensitive to the magnitude…
someben
  • 798
40
votes
5 answers

What is the difference between Conv1D and Conv2D?

I was going through the keras convolution docs and I have found two types of convultuion Conv1D and Conv2D. I did some web search and this is what I understands about Conv1D and Conv2D; Conv1D is used for sequences and Conv2D uses for images. I…
Eka
  • 2,251
40
votes
3 answers

Why are Decision Trees not computationally expensive?

In An Introduction to Statistical Learning with Applications in R, the authors write that fitting a decision tree is very fast, but this doesn't make sense to me. The algorithm has to go through every feature and partition it in every way possible…
DataOrc
  • 451
40
votes
5 answers

How is the cost function from Logistic Regression differentiated

I am doing the Machine Learning Stanford course on Coursera. In the chapter on Logistic Regression, the cost function is this: Then, it is differentiated here: I tried getting the derivative of the cost function, but I got something completely…
bsky
  • 1,199
40
votes
4 answers

ROC vs Precision-recall curves on imbalanced dataset

I just finished reading this discussion. They argue that PR AUC is better than ROC AUC on imbalanced dataset. For example, we have 10 samples in test dataset. 9 samples are positive and 1 is negative. We have a terrible model which predicts…
40
votes
4 answers

Raw or orthogonal polynomial regression?

I want to regress a variable $y$ onto $x,x^2,\ldots,x^5$. Should I do this using raw or orthogonal polynomials? I looked at the question on the site that deals with these, but I don't really understand what's the difference between using them. Why…
l7ll7
  • 1,275