Most Popular
1500 questions
35
votes
3 answers
What stop-criteria for agglomerative hierarchical clustering are used in practice?
I have found extensive literature proposing all sorts of criteria (e.g. Glenn et al. 1985(pdf) and Jung et al. 2002(pdf)). However, most of these are not that easy to implement (at least from my perspective). I am using scipy.cluster.hierarchy to…
Björn Pollex
- 1,383
35
votes
1 answer
What are some useful guidelines for GBM parameters?
What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM?
Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…
Ram Ahluwalia
- 3,081
35
votes
3 answers
Things to consider about masters programs in statistics
It is admission season for graduate schools. I (and many students like me) am now trying to decide which statistics program to pick.
What are some things those of you who work with statistics suggest we consider about masters programs in…
AttemptedStudent
- 151
35
votes
11 answers
Why is generating 8 random bits uniform on (0, 255)?
I am generating 8 random bits (either a 0 or a 1) and concatenating them together to form an 8-bit number. A simple Python simulation yields a uniform distribution on the discrete set [0, 255].
I am trying to justify why this makes sense in my…
glassy
- 481
35
votes
7 answers
Convolutional Layers: To pad or not to pad?
AlexNet architecture uses zero-paddings as shown in the pic. However, there is no explanation in the paper why this padding is introduced.
Standford CS 231n course teaches we use padding to preserve the spatial size:
I am curious if that is the…
Jumabek Alihanov
- 403
35
votes
4 answers
How to measure smoothness of a time series in R?
Is there a good way to measure smoothness of a time series in R? For example,
-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0
is much smoother than
-1, 0.8, -0.6, 0.4, -0.2, 0, 0.2, -0.4, 0.6, -0.8, 1.0
although they have same mean and…
agmao
- 451
35
votes
2 answers
PCA in numpy and sklearn produces different results
Am i misunderstanding something. This is my code
using sklearn
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
from sklearn.preprocessing…
aceminer
- 1,043
- 1
- 10
- 23
35
votes
3 answers
Is whitening always good?
A common pre-processing step for machine learning algorithms is whitening of data.
It seems like it is always good to do whitening since it de-correlates the data, making it simpler to model.
When is whitening not recommended?
Note: I'm referring to…
Ran
- 1,626
35
votes
3 answers
How to build the final model and tune probability threshold after nested cross-validation?
Firstly, apologies for posting a question that has already been discussed at length here, here, here, here, here, and for reheating an old topic. I know @DikranMarsupial has written about this topic at length in posts and journal papers, but I'm…
Dr. Andrew John Lowe
- 451
- 5
- 5
35
votes
3 answers
What is the most accurate way of determining an object's color?
I have written a computer program that can detect coins in a static image (.jpeg, .png, etc.) using some standard techniques for computer vision (Gaussian Blur, thresholding, Hough-Transform etc.). Using the ratios of the coins picked up from a…
MoonKnight
- 717
35
votes
5 answers
Think like a bayesian, check like a frequentist: What does that mean?
I am looking at some lecture slides on a data science course which can be found here:
https://github.com/cs109/2015/blob/master/Lectures/01-Introduction.pdf
I, unfortunately, cannot see the video for this lecture and at one point on the slide, the…
Luca
- 4,650
35
votes
2 answers
Raw residuals versus standardised residuals versus studentised residuals - what to use when?
This looks like a similar question and didn't get many responses.
Omitting tests such as Cook's D, and just looking at residuals as a group, I am interested in how others use residuals when assessing goodness-of-fit. I use the raw residuals:
in a…
Michelle
- 3,900
35
votes
3 answers
How and why does Batch Normalization use moving averages to track the accuracy of the model as it trains?
I was reading the batch normalization (BN) paper (1) and didn't understand the need to use moving averages to track the accuracy of the model and even if I accepted that it was the right thing to do, I don't understand what they are doing…
Charlie Parker
- 6,866
35
votes
2 answers
If the Epanechnikov kernel is theoretically optimal when doing Kernel Density Estimation, why isn't it more commonly used?
I have read (for example, here) that the Epanechnikov kernel is optimal, at least in a theoretical sense, when doing kernel density estimation. If this is true, then why does the Gaussian show up so frequently as the default kernel, or in many…
John Rauser
- 451
35
votes
2 answers
How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?
I want to generate the plot described in the book ElemStatLearn "The Elements of
Statistical Learning: Data Mining, Inference, and Prediction. Second Edition" by Trevor Hastie
& Robert Tibshirani& Jerome Friedman. The plot is:
I am wondering how I…
littleEinstein
- 533