Most Popular

1500 questions
49
votes
7 answers

Importance of local response normalization in CNN

I've found that Imagenet and other large CNN makes use of local response normalization layers. However, I cannot find that much information about them. How important are they and when should they be used? From…
pir
  • 5,056
49
votes
3 answers

Intuitive explanation for density of transformed variable?

Suppose $X$ is a random variable with pdf $f_X(x)$. Then the random variable $Y=X^2$ has the pdf $$f_Y(y)=\begin{cases}\frac{1}{2\sqrt{y}}\left(f_X(\sqrt{y})+f_X(-\sqrt{y})\right) & y \ge 0 \\ 0 & y \lt 0\end{cases}$$ I understand the calculus…
lowndrul
  • 2,117
49
votes
3 answers

Latent Class Analysis vs. Cluster Analysis - differences in inferences?

What are the differences in inferences that can be made from a latent class analysis (LCA) versus a cluster analysis? Is it correct that a LCA assumes an underlying latent variable that gives rise to the classes, whereas the cluster analysis is an…
Brian P
  • 525
49
votes
14 answers

Clarification on interpreting confidence intervals?

My current understanding of the notion "confidence interval with confidence level $1 - \alpha$" is that if we tried to calculate the confidence interval many times (each time with a fresh sample), it would contain the correct parameter $1 - \alpha$…
Elliott
  • 543
48
votes
5 answers

What is the purpose of characteristic functions?

I'm hoping that someone can explain, in layman's terms, what a characteristic function is and how it is used in practice. I've read that it is the Fourier transform of the pdf, so I guess I know what it is, but I still don't understand its purpose.…
Nick
  • 3,537
48
votes
3 answers

Is standardisation before Lasso really necessary?

I have read three main reasons for standardising variables before something such as Lasso regression: 1) Interpretability of coefficients. 2) Ability to rank the coefficient importance by the relative magnitude of post-shrinkage coefficient…
Jase
  • 2,246
48
votes
3 answers

What are correct values for precision and recall when the denominators equal 0?

Precision is defined as: p = true positives / (true positives + false positives) What is the value of precision if (true positives + false positives) = 0? Is it just undefined? Same question for recall: r = true positives / (true positives +…
khatchad
  • 751
48
votes
2 answers

Mixed Effects Model with Nesting

I have data collected from an experiment organized as follows: Two sites, each with 30 trees. 15 are treated, 15 are control at each site. From each tree, we sample three pieces of the stem, and three pieces of the roots, so 6 level 1 samples per…
Erik
  • 535
48
votes
2 answers

Interpreting the residuals vs. fitted values plot for verifying the assumptions of a linear model

Consider the following figure from Faraway's Linear Models with R (2005, p. 59). The first plot seems to indicate that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed…
Evan Aad
  • 1,433
48
votes
7 answers

Bayesian statistics tutorial

I am trying to get upto speed in Bayesian Statistics. I have a little bit of stats background (STAT 101) but not too much - I think I can understand prior, posterior, and likelihood :D. I don't want to read a Bayesian textbook just yet. I'd prefer…
Andy
  • 1,683
48
votes
4 answers

Feature map for the Gaussian kernel

In SVM, the Gaussian kernel is defined as: $$K(x,y)=\exp\left({-\frac{\|x-y\|_2^2}{2\sigma^2}}\right)=\phi(x)^T\phi(y)$$ where $x, y\in \mathbb{R^n}$. I do not know the explicit equation of $\phi$. I want to know it. I also want to know…
Vivian
  • 855
48
votes
1 answer

Does down-sampling change logistic regression coefficients?

If I have a dataset with a very rare positive class, and I down-sample the negative class, then perform a logistic regression, do I need to adjust the regression coefficients to reflect the fact that I changed the prevalence of the positive…
Zach
  • 23,766
48
votes
2 answers

What are the practical differences between the Benjamini & Hochberg (1995) and the Benjamini & Yekutieli (2001) false discovery rate procedures?

My statistics program implements both the Benjamini & Hochberg (1995) and Benjamini & Yekutieli (2001) false discovery rate (FDR) procedures. I have done my best to read through the later paper, but it is quite mathematically dense and I am not…
russellpierce
  • 18,599
48
votes
10 answers

Why do people use p-values instead of computing probability of the model given data?

Roughly speaking a p-value gives a probability of the observed outcome of an experiment given the hypothesis (model). Having this probability (p-value) we want to judge our hypothesis (how likely it is). But wouldn't it be more natural to calculate…
Roman
  • 584
48
votes
5 answers

First R packages source code to study in preparation for writing own package

I'm planning to start writing R packages. I thought it would be good to study the source code of existing packages in order to learn the conventions of package construction. My criteria for good packages to study: Simple statistical/technical…
Jeromy Anglim
  • 44,984