Most Popular
1500 questions
48
votes
6 answers
What are best practices in identifying interaction effects?
Other than literally testing each possible combination of variable(s) in a model (x1:x2 or x1*x2 ... xn-1 * xn). How do you identify if an interaction SHOULD or COULD exist between your independent (hopefully) variables?
What are best practices in…
Brandon Bertelsen
- 7,232
- 9
- 41
- 48
48
votes
6 answers
Why don't linear regression assumptions matter in machine learning?
When I learned linear regression in my statistics class, we are asked to check for a few assumptions which need to be true for linear regression to make sense. I won't delve deep into those assumptions, however, these assumptions don't appear when…
kamal tanwar
- 591
48
votes
1 answer
Intuition behind tensor product interactions in GAMs (MGCV package in R)
Generalized additive models are those where
$$
y = \alpha + f_1(x_1) + f_2(x_2) + e_i
$$
for example. the functions are smooth, and to be estimated. Usually by penalized splines. MGCV is a package in R that does so, and the author (Simon Wood)…
generic_user
- 13,339
48
votes
3 answers
How do DAGs help to reduce bias in causal inference?
I have read in several places that the use of DAGs can help to reduce bias due to
Confounding
Differential Selection
Mediation
Conditioning on a collider
I also see the term “backdoor path” a lot.
How do we use DAGs to reduce these biases, and…
LeelaSella
- 2,030
48
votes
5 answers
Fake uniform random numbers: More evenly distributed than true uniform data
I'm looking for a way to generate random numbers that appear to be uniform distributed -- and every test will show them to be uniform -- except that they are more evenly distributed than true uniform data.
The problem I have with the "true" uniform…
Has QUIT--Anony-Mousse
- 42,358
48
votes
3 answers
When to use a GAM vs GLM
I realize this may be a potentially broad question, but I was wondering whether there are assumptions that indicate the use of a GAM (Generalized additive model) over a GLM (Generalized linear model)?
Someone recently told me that GAMs should only…
mluerig
- 701
48
votes
3 answers
SVM, Overfitting, curse of dimensionality
My dataset is small (120 samples), however the number of features are large varies from (1000-200,000). Although I'm doing feature selection to pick a subset of features, it might still overfit.
My first question is, how does SVM handle…
user13420
- 875
48
votes
1 answer
Why KL divergence is non-negative?
Why is KL divergence non-negative?
From the perspective of information theory, I have such an intuitive understanding:
Say there are two ensembles $A$ and $B$ which are composed of the same set of elements labeled by $x$. $p(x)$ and $q(x)$ are…
meTchaikovsky
- 1,832
48
votes
1 answer
PCA objective function: what is the connection between maximizing variance and minimizing error?
The PCA algorithm can be formulated in terms of the correlation matrix (assume the data $X$ has already been normalized and we are only considering projection onto the first PC). The objective function can be written as:
$$ \max_w (Xw)^T(Xw)\; \:…
Cam.Davidson.Pilon
- 12,153
48
votes
8 answers
Pitfalls in time series analysis
I am just starting out self-learning in time series analysis. I have noticed that there are a number of potential pitfalls that are not applicable to general statistics. So, building on What are common statistical sins?, I would like to ask:
What…
naught101
- 5,453
48
votes
5 answers
What is the difference between a population and a sample?
What is the difference between a population and a sample? What common variables and statistics are used for each one, and how do those relate to each other?
Baltimark
- 2,268
48
votes
2 answers
What exactly is the alpha in the Dirichlet distribution?
I'm fairly new to Bayesian statistics and I came across a corrected correlation measure, SparCC, that uses the Dirichlet process in the backend of it's algorithm. I have been trying to go through the algorithm step-by-step to really understand what…
O.rka
- 1,442
- 4
- 21
- 32
48
votes
4 answers
When should I balance classes in a training data set?
I had an online course, where I learned, that unbalanced classes in the training data might lead to problems, because classification algorithms go for the majority rule, as it gives good results if the unbalance is too much. In an assignment one had…
48
votes
4 answers
Evaluation measures of goodness or validity of clustering (without having truth labels)
I'm clustering a set of data but I don't have truth document that allow me to evaluate the result of clustering (I have unlabelled data), so I can not use an external evaluation measure. In this case, is there any efficient evaluation measures -…
shn
- 2,959
48
votes
5 answers
How do I fit a constrained regression in R so that coefficients total = 1?
I see a similar constrained regression here:
Constrained linear regression through a specified point
but my requirement is slightly different. I need the coefficients to add up to 1. Specifically I am regressing the returns of 1 foreign exchange…
Thomas Browne
- 631