Most Popular
1500 questions
37
votes
2 answers
Explanation of Spikes in training loss vs. iterations with Adam Optimizer
I am training a neural network using i) SGD and ii) Adam Optimizer. When using normal SGD, I get a smooth training loss vs. iteration curve as seen below (the red one). However, when I used the Adam Optimizer, the training loss curve has some…
Abdul Fatir
- 473
37
votes
5 answers
Why is Poisson regression used for count data?
I understand that for certain datasets such as voting it performs better. Why is Poisson regression used over ordinary linear regression or logistic regression? What is the mathematical motivation for it?
zaxtax
- 543
37
votes
1 answer
"Absolutely continuous random variable" vs. "Continuous random variable"?
In the book Limit Theorems of Probability Theory by Valentin V. Petrov, I saw a distinction between the definitions of a distribution being "continuous" and "absolutely continuous", which is stated as follows:
"...The distribution of the random…
Tian
- 619
37
votes
10 answers
What are the most useful sources of economics data?
When doing research in Economy, one frequently needs to verify theoretical conclusions on real data. What are reliable data sources to use and cite? I am mainly interested in sources that provide various statistical data such as GDP, population,…
Karel Petranek
- 341
37
votes
5 answers
What is the fiducial argument and why has it not been accepted?
One of the late contributions of R.A. Fisher was fiducial intervals and fiducial principled arguments. This approach however is nowhere near as popular as frequentist or Bayesian principled arguments.
What is the fiducial argument and why has is…
JohnRos
- 5,684
37
votes
3 answers
Can AUC-ROC be between 0-0.5?
Can AUC-ROC values be between 0-0.5? Does the model ever output values between 0 and 0.5?
Aman
- 613
37
votes
6 answers
What is difference between 'transfer learning' and 'domain adaptation'?
Is there any difference between 'transfer learning' and 'domain adaptation'?
I don't know about context, but my understanding is that we have some dataset 1 and train on it, after which we have another dataset 2 for which we want to adapt our model…
mrgloom
- 2,207
37
votes
8 answers
What is a standard deviation?
What is a standard deviation, how is it calculated and what is its use in statistics?
Oren Hizkiya
- 919
37
votes
3 answers
How to perform isometric log-ratio transformation
I have data on movement behaviours (time spent sleeping, sedentary, and doing physical activity) that sums to approximately 24 (as in hours per day). I want to create a variable that captures the relative time spent in each of these behaviours -…
Nicole
- 373
37
votes
6 answers
Changing the scale of a variable to 0-100
I have constructed a social capital index using PCA technique. This index comprises values both positive and negative. I want to transform / convert this index to 0-100 scale to make it easy to interpret. Please suggest me an easiest way to do so.
Sohail Akram
- 379
37
votes
6 answers
Why should we shuffle data while training a neural network?
In the mini-batch training of a neural network, I heard that an important practice is to shuffle the training data before every epoch. Can somebody explain why the shuffling at each epoch helps?
From the google search, I found the following…
DSKim
- 1,289
37
votes
3 answers
Should training samples randomly drawn for mini-batch training neural nets be drawn without replacement?
We define an epoch as having gone through the entirety of all available training samples, and the mini-batch size as the number of samples over which we average to find the updates to weights/biases needed to descend the gradient.
My question is…
phoenixdown
- 661
37
votes
5 answers
Why are the weights of RNN/LSTM networks shared across time?
I've recently become interested in LSTMs and I was surprised to learn that the weights are shared across time.
I know that if you share the weights across time, then your input time sequences can be a variable length.
With shared weights you…
beeCwright
- 538
- 1
- 4
- 8
37
votes
1 answer
k-NN computational complexity
What is the time complexity of the k-NN algorithm with naive search approach (no k-d tree or similars)?
I am interested in its time complexity considering also the hyperparameter k. I have found contradictory answers:
O(nd + kn), where n is the…
Daniel López
- 5,646
37
votes
4 answers
Encoding Angle Data for Neural Network
I am training a neural network (details not important) where the target data is a vector of angles (between 0 and 2*pi). I am looking for advice on how to encode this data. Here is what I am currently trying (with limited success):
1) 1-of-C…
Ari Herman
- 629
- 1
- 6
- 8