Most Popular
1500 questions
36
votes
5 answers
How do I use the SVD in collaborative filtering?
I'm a bit confused with how the SVD is used in collaborative filtering. Suppose I have a social graph, and I build an adjacency matrix from the edges, then take an SVD (let's forget about regularization, learning rates, sparsity optimizations, etc),…
Vishal
- 1,200
36
votes
1 answer
Differences between a statistical model and a probability model?
Applied probability is an important branch in probability, including computational probability. Since statistics is using probability theory to construct models to deal with data, as my understanding, I am wondering what's the essential difference…
Honglang Wang
- 945
36
votes
1 answer
Mathematical differences between GBM, XGBoost, LightGBM, CatBoost?
There exist several implementations of the GBDT family of model such as:
GBM
XGBoost
LightGBM
Catboost.
What are the mathematical differences between these different implementations?
Catboost seems to outperform the other implementations even by…
Metariat
- 2,526
- 4
- 24
- 43
36
votes
5 answers
Can you overfit by training machine learning algorithms using CV/Bootstrap?
This question may well be too open-ended to get a definitive answer, but hopefully not.
Machine learning algorithms, such as SVM, GBM, Random Forest etc, generally have some free parameters that, beyond some rule of thumb guidance, need to be tuned…
Bogdanovist
- 6,619
36
votes
7 answers
How to generate numbers based on an arbitrary discrete distribution?
How do I generate numbers based on an arbitrary discrete distribution?
For example, I have a set of numbers that I want to generate. Say they are labelled from 1-3 as follows.
1: 4%, 2: 50%, 3: 46%
Basically, the percentages are probabilities that…
FurtiveFelon
- 531
36
votes
2 answers
Calculate Transition Matrix (Markov) in R
Is there a way in R (a built-in function) to calculate the transition matrix for a Markov Chain from a set of observations?
For example, taking a data set like the following and calculate the first order transition…
B_Miner
- 8,630
36
votes
6 answers
Sample size for logistic regression?
I want to make a logistic model from my survey data. It is a small survey of four residential colonies in which only 154 respondents were interviewed. My dependent variable is "satisfactory transition to work". I found that, of the 154 respondents,…
Braj-Stat
- 611
36
votes
3 answers
What is "baseline" in precision recall curve
I'm trying to understand precision recall curve, I understand what precision and recall are but the thing I don't understand is the "baseline" value. I was reading this link…
hyeri
- 461
36
votes
4 answers
LASSO with interaction terms - is it okay if main effects are shrunk to zero?
LASSO regression shrinks coefficients towards zero, thus providing effectively model selection. I believe that in my data there are meaningful interactions between nominal and continuous covariates. Not necessarily, however, are the 'main effects'…
tomka
- 6,572
36
votes
5 answers
Clustering methods that do not require pre-specifying the number of clusters
Are there any "non-parametric" clustering methods for which we don't need to specify the number of clusters? And other parameters like the number of points per cluster, etc.
Learn_and_Share
- 866
- 1
- 10
- 18
36
votes
2 answers
How to use ordinal logistic regression with random effects?
In my study I will be measuring workload with several metrics. With heart-rate variability (HRV), electrodermal activity (EDA) and with a subjective scale (IWS). After normalization the IWS has three values:
Workload lower than normal
Workload is…
Robin Kramer-ten Have
- 639
- 2
- 6
- 15
36
votes
1 answer
What are the properties of a half Cauchy distribution?
I am currently working on a problem, where I need to develop a Markov chain Monte Carlo (MCMC) algorithm for a state space model.
To be able to solve the problem, I have been given the following probability of $\tau$: p($\tau$) =…
Christoph
- 365
- 1
- 3
- 4
36
votes
4 answers
Why is it important to include a bias correction term for the Adam optimizer for Deep Learning?
I was reading about the Adam optimizer for Deep Learning and came across the following sentence in the new book Deep Learning by Begnio, Goodfellow and Courtville:
Adam includes bias corrections to the estimates of both the
first-order moments…
Charlie Parker
- 6,866
36
votes
6 answers
Why do some people use -999 or -9999 to replace missing values?
I have a dataset. There are lots of missing values. For some columns, the missing value was replaced with -999, but other columns, the missing value was marked as 'NA'.
Why would we use -999 to replace the missing value?
qqqwww
- 503
36
votes
1 answer
Cross-validation misuse (reporting performance for the best hyperparameter value)
Recently I have come across a paper that proposes using a k-NN classifier on an specific dataset. The authors used all the data samples available to perform k-fold cross validation for different k values and report cross validation results of the…
Daniel López
- 5,646