Most Popular

1500 questions
40
votes
3 answers

What are the benefits of using ReLU over softplus as activation functions?

It is often mentioned that rectified linear units (ReLU) have superseded softplus units because they are linear and faster to compute. Does softplus it still have the advantage of inducing sparsity or is that restricted to the ReLU? The reason I ask…
brockl33
  • 501
40
votes
8 answers

Random walk on the edges of a cube

An ant is placed in a corner of a cube and cannot move. A spider starts from the opposite corner, and can move along the cube's edges in any direction $(x,y,z)$ with equal probability $1/3$. On average, how many steps will the spider need to get to…
40
votes
1 answer

Is there any difference between $r^2$ and $R^2$?

The correlation coefficient is usually written with a capital $R$ but sometimes not. I wonder if there really is a difference between $r^2$ and $R^2$? Can $r$ mean something else than a correlation coefficient?
DJack
  • 617
40
votes
7 answers

How to interpret the coefficient of variation?

I am trying to understand the Coefficient of Variation. When I try to apply it to the following two samples of data I am unable to understand how to interpret the results. Let's say sample 1 is ${0, 5, 7, 12, 11, 17}$ and sample 2 is ${10 ,15 ,17…
Durin
  • 1,033
  • 2
  • 8
  • 20
40
votes
5 answers

The meaning of "positive dependency" as a condition to use the usual method for FDR control

Benjamini and Hochberg developed the first (and still most widely used, I think) method for controlling the false discovery rate (FDR). I want to start with a bunch of P values, each for a different comparison, and decide which ones are low enough…
40
votes
2 answers

Is there a way to use the covariance matrix to find coefficients for multiple regression?

For simple linear regression, the regression coefficient is calculable directly from the variance-covariance matrix $C$, by $$ C_{d, e}\over C_{e,e} $$ where $d$ is the dependent variable's index, and $e$ is the explanatory variable's index. If one…
David
  • 515
40
votes
4 answers

What is the difference between the vertical bar and semi-colon notations?

What is the difference in meaning between the notation $P(z;d,w)$ and $P(z|d,w)$ which are commonly used in many books and papers?
Learner
  • 4,457
40
votes
5 answers

Examples of PCA where PCs with low variance are "useful"

Normally in principal component analysis (PCA) the first few PCs are used and the low variance PCs are dropped, as they do not explain much of the variation in the data. However, are there examples where the low variation PCs are useful (i.e. have…
Michael
  • 403
39
votes
3 answers

Application of machine learning methods in StackExchange websites

I have a Machine Learning course this semester and the professor asked us to find a real-world problem and solve it by one of machine learning methods introduced in the class, as: Decision Trees Artificial Neural Networks Support Vector…
Isaac
  • 1,003
39
votes
1 answer

What are the advantages of kernel PCA over standard PCA?

I want to implement an algorithm in a paper which uses kernel SVD to decompose a data matrix. So I have been reading materials about kernel methods and kernel PCA etc. But it still is very obscure to me especially when it comes to mathematical…
39
votes
1 answer

Why is Mantel's test preferred over Moran's I?

Mantel's test is widely used in biological studies to examine the correlation between the spatial distribution of animals (position in space) with, for example, their genetic relatedness, rate of aggression or some other attribute. Plenty of good…
39
votes
8 answers

Is it possible to prove a null hypothesis?

As the question states - Is it possible to prove the null hypothesis? From my (limited) understanding of hypothesis, the answer is no but I can't come up with a rigorous explanation for it. Does the question have a definitive answer?
39
votes
1 answer

Link Anomaly Detection in Temporal Network

I came across this paper that uses link anomaly detection to predict trending topics, and I found it incredibly intriguing: The paper is "Discovering Emerging Topics in Social Streams via Link Anomaly Detection". I would love to replicate it on a…
Olga Mu
  • 705
39
votes
5 answers

Neural network with skip-layer connections

I am interested in regression with neural networks. Neural networks with zero hidden nodes + skip-layer connections are linear models. What about the same neural nets but with hidden nodes ? I am wondering what would be the role of the skip-layer…
Ben
  • 551
39
votes
3 answers

Is it possible to find the combined standard deviation?

Suppose I have 2 sets: Set A: number of items $n= 10$, $\mu = 2.4$ , $\sigma = 0.8$ Set B: number of items $n= 5$, $\mu = 2$, $\sigma = 1.2$ I can find the combined mean ($\mu$) easily, but how am I supposed to find the combined standard deviation?
kype
  • 525