Most Popular

1500 questions
36
votes
5 answers

Wikipedia entry on likelihood seems ambiguous

I have a simple question regarding "conditional probability" and "Likelihood". (I have already surveyed this question here but to no avail.) It starts from the Wikipedia page on likelihood. They say this: The likelihood of a set of parameter…
Creatron
  • 1,655
36
votes
7 answers

Why are symmetric positive definite (SPD) matrices so important?

I know the definition of symmetric positive definite (SPD) matrix, but want to understand more. Why are they so important, intuitively? Here is what I know. What else? For a given data, Co-variance matrix is SPD. Co-variance matrix is a important…
Haitao Du
  • 36,852
  • 25
  • 145
  • 242
36
votes
3 answers

What are the differences between Logistic Function and Sigmoid Function?

$$ \begin{equation} f(x)=\frac{L}{1+e^{-k(x-x_0)}} \end{equation} $$ Fig 1. (img) Logistic Function $$ \begin{equation} S(x)= \frac{1}{1+e^{-t}} \end{equation} $$ Fig 2. (img) Sigmoid Function What are the differences between Logistic Function and…
Jul
  • 715
36
votes
3 answers

What is the difference between dropout and drop connect?

What is the difference between dropout and drop connect? AFAIK, dropout randomly drops hidden nodes during training but keeps them in testing, and drop connect drops connections. But isn't dropping connections equivalent to dropping the hidden…
Machina333
  • 1,123
36
votes
3 answers

Building an autoencoder in Tensorflow to surpass PCA

Hinton and Salakhutdinov in Reducing the Dimensionality of Data with Neural Networks, Science 2006 proposed a non-linear PCA through the use of a deep autoencoder. I have tried to build and train a PCA autoencoder with Tensorflow several times but I…
Donbeo
  • 3,129
36
votes
2 answers

How to make a reward function in reinforcement learning?

While studying Reinforcement Learning, I have come across many forms of the reward function: $R(s,a)$, $R(s,a,s')$, and even a reward function that only depends on the current state. Having said that, I realized it is not very easy to 'make' or…
cgo
  • 9,107
36
votes
3 answers

Why are bias nodes used in neural networks?

Why are bias nodes used in neural networks? How many you should use? In which layers you should use them: all hidden layers and the output layer?
grmmhp
  • 461
36
votes
2 answers

Understanding distance correlation computations

As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers: (x1, y1) (x2, y2) ... (xn, yn) we can use distance…
Roman
  • 584
36
votes
2 answers

Why is Lasso penalty equivalent to the double exponential (Laplace) prior?

I have read in a number of references that the Lasso estimate for the regression parameter vector $B$ is equivalent to the posterior mode of $B$ in which the prior distribution for each $B_i$ is a double exponential distribution (also known as…
Wintermute
  • 1,317
36
votes
1 answer

predict() Function for lmer Mixed Effects Models

The problem: I have read in other posts that predict is not available for mixed effects lmer {lme4} models in [R]. I tried exploring this subject with a toy dataset... Background: The dataset is adapted form this source, and available…
36
votes
3 answers

Pre-training in deep convolutional neural network?

Have anyone seen any literature on pre-training in deep convolutional neural network? I have only seen unsupervised pre-training in autoencoder or restricted boltzman machines.
RockTheStar
  • 12,907
  • 34
  • 71
  • 96
36
votes
6 answers

Interpretation of Shapiro-Wilk test

I'm pretty new to statistics and I need your help. I have a small sample, as follows: H4U 0.269 0.357 0.2 0.221 0.275 0.277 0.253 0.127 0.246 I ran the Shapiro-Wilk test using…
Jakub
  • 737
36
votes
2 answers

What is "reduced-rank regression" all about?

I have been reading The Elements of Statistical Learning and I could not understand what Section 3.7 "Multiple outcome shrinkage and selection" is all about. It talks about RRR (reduced-rank regression), and I can only understand that the premise is…
cgo
  • 9,107
36
votes
3 answers

How to perform orthogonal regression (total least squares) via PCA?

I always use lm() in R to perform linear regression of $y$ on $x$. That function returns a coefficient $\beta$ such that $$y = \beta x.$$ Today I learned about total least squares and that princomp() function (principal component analysis, PCA) can…
Dail
  • 2,637
  • 12
  • 44
  • 54
36
votes
3 answers

Outlier Detection on skewed Distributions

Under a classical definition of an outlier as a data point outide the 1.5* IQR from the upper or lower quartile, there is an assumption of a non-skewed distribution. For skewed distributions (Exponential, Poisson, Geometric, etc) is the best way to…
Eric
  • 361