Most Popular
1500 questions
38
votes
4 answers
Is a strong background in maths a total requisite for ML?
I'm starting to want to advance my own skillset and I've always been fascinated by machine learning. However, six years ago instead of pursuing this I decided to take a completely unrelated degree to computer science.
I have been developing…
Layke
- 503
38
votes
4 answers
Determine different clusters of 1d data from database
I have a database table of data transfers between different nodes. This is a huge database (with nearly 40 million transfers). One of the attributes is the number of bytes (nbytes) transfers which range from 0 bytes to 2 tera bytes. I would like to…
Shaun
- 381
38
votes
5 answers
Measuring the "distance" between two multivariate distributions
I'm looking for some good terminology to describe what I'm trying to do, to make it easier to look for resources.
So, say I have two clusters of points A and B, each associated to two values, X and Y, and I want to measure the "distance" between A…
Emile
- 1,097
38
votes
2 answers
Intuitive explanation of how UMAP works, compared to t-SNE
I have a PhD in molecular biology. My studies recently started to involve high dimensional data analysis. I got the idea of how t-SNE works (thanks to a StatQuest video on YouTube) but can't seem to wrap my mind around UMAP (I listened to the UMAP…
Atakan
- 751
38
votes
1 answer
Comparison between SHAP (Shapley Additive Explanation) and LIME (Local Interpretable Model-Agnostic Explanations)
I am reading up about two popular post hoc model interpretability techniques: LIME and SHAP
I am having trouble understanding the key difference in these two techniques.
To quote Scott Lundberg, the brains behind SHAP:
SHAP values come with the…
user248884
- 501
38
votes
4 answers
Data has two trends; how to extract independent trendlines?
I have a set of data that is not ordered in any particular way but when plotted clearly has two distinct trends. A simple linear regression would not really be adequate here because of the clear distinction between the two series. Is there a simple…
jonathanbsyd
- 483
38
votes
3 answers
Mode, Class and Type of R objects
I was wondering what are the differences between Mode, Class and Type of R objects?
Type of a R object can be obtained by typeof() function, mode by mode(), and class by class().
Also any other similar functions and concepts that I missed?
Thanks…
Tim
- 19,445
38
votes
4 answers
Origin of "5$\sigma$" threshold for accepting evidence in particle physics?
News reports say that CERN will announce tomorrow that the Higgs boson has been experimentally detected with 5$\sigma$ evidence. According to that article:
5$\sigma$ equates to a 99.99994% chance that the data the CMS and ATLAS
detectors are…
Harvey Motulsky
- 20,456
38
votes
4 answers
Are inconsistent estimators ever preferable?
Consistency is obviously a natural and important property of estimators, but are there situations where it may be better to use an inconsistent estimator rather than a consistent one?
More specifically, are there examples of an inconsistent…
MånsT
- 11,979
38
votes
2 answers
What does kernel size mean?
When people talk about neural networks, what do they mean when they say "kernel size"? Kernels are similarity functions, but what does that say about kernel size?
quil
- 493
38
votes
3 answers
Linearity of PCA
PCA is considered a linear procedure, however:
$$\mathrm{PCA}(X)\neq \mathrm{PCA}(X_1)+\mathrm{PCA}(X_2)+\ldots+\mathrm{PCA}(X_n),$$
where $X=X_1+X_2+\ldots+X_n$. This is to say that the eigenvectors obtained by the PCAs on the data matrices $X_i$…
AlphaOmega
- 707
- 7
- 13
38
votes
2 answers
Gradient backpropagation through ResNet skip connections
I'm curious about how gradients are back-propagated through a neural network using ResNet modules/skip connections. I've seen a couple of questions about ResNet (e.g. Neural network with skip-layer connections) but this one is asking specifically…
Simon
- 2,341
38
votes
3 answers
How to estimate shrinkage parameter in Lasso or ridge regression with >50K variables?
I want to use Lasso or ridge regression for a model with more than 50,000 variables. I want do so using software package in R. How can I estimate the shrinkage parameter ($\lambda$)?
Edits:
Here is the point I got up to:
set.seed (123)
Y <- runif…
John
- 2,258
38
votes
5 answers
Can SVM do stream learning one example at a time?
I have a streaming data set, examples are available one at a time. I would need to do multi class classification on them. As soon as I fed a training example to the learning process, I have to discard the example. Concurrently, I'm also using the…
siamii
- 2,057
38
votes
3 answers
Is it possible to calculate AIC and BIC for lasso regression models?
Is it possible to calculate AIC or BIC values for lasso regression models and other regularized models where parameters are only partially entering the equation. How does one determine the degrees of freedom?
I'm using R to fit lasso regression…
Jota
- 894