Most Popular

1500 questions
35
votes
3 answers

What stop-criteria for agglomerative hierarchical clustering are used in practice?

I have found extensive literature proposing all sorts of criteria (e.g. Glenn et al. 1985(pdf) and Jung et al. 2002(pdf)). However, most of these are not that easy to implement (at least from my perspective). I am using scipy.cluster.hierarchy to…
35
votes
1 answer

What are some useful guidelines for GBM parameters?

What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM? Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…
35
votes
3 answers

Things to consider about masters programs in statistics

It is admission season for graduate schools. I (and many students like me) am now trying to decide which statistics program to pick. What are some things those of you who work with statistics suggest we consider about masters programs in…
35
votes
11 answers

Why is generating 8 random bits uniform on (0, 255)?

I am generating 8 random bits (either a 0 or a 1) and concatenating them together to form an 8-bit number. A simple Python simulation yields a uniform distribution on the discrete set [0, 255]. I am trying to justify why this makes sense in my…
glassy
  • 481
35
votes
7 answers

Convolutional Layers: To pad or not to pad?

AlexNet architecture uses zero-paddings as shown in the pic. However, there is no explanation in the paper why this padding is introduced. Standford CS 231n course teaches we use padding to preserve the spatial size: I am curious if that is the…
35
votes
4 answers

How to measure smoothness of a time series in R?

Is there a good way to measure smoothness of a time series in R? For example, -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0 is much smoother than -1, 0.8, -0.6, 0.4, -0.2, 0, 0.2, -0.4, 0.6, -0.8, 1.0 although they have same mean and…
agmao
  • 451
35
votes
2 answers

PCA in numpy and sklearn produces different results

Am i misunderstanding something. This is my code using sklearn import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import decomposition from sklearn import datasets from sklearn.preprocessing…
aceminer
  • 1,043
  • 1
  • 10
  • 23
35
votes
3 answers

Is whitening always good?

A common pre-processing step for machine learning algorithms is whitening of data. It seems like it is always good to do whitening since it de-correlates the data, making it simpler to model. When is whitening not recommended? Note: I'm referring to…
Ran
  • 1,626
35
votes
3 answers

How to build the final model and tune probability threshold after nested cross-validation?

Firstly, apologies for posting a question that has already been discussed at length here, here, here, here, here, and for reheating an old topic. I know @DikranMarsupial has written about this topic at length in posts and journal papers, but I'm…
35
votes
3 answers

What is the most accurate way of determining an object's color?

I have written a computer program that can detect coins in a static image (.jpeg, .png, etc.) using some standard techniques for computer vision (Gaussian Blur, thresholding, Hough-Transform etc.). Using the ratios of the coins picked up from a…
35
votes
5 answers

Think like a bayesian, check like a frequentist: What does that mean?

I am looking at some lecture slides on a data science course which can be found here: https://github.com/cs109/2015/blob/master/Lectures/01-Introduction.pdf I, unfortunately, cannot see the video for this lecture and at one point on the slide, the…
Luca
  • 4,650
35
votes
2 answers

Raw residuals versus standardised residuals versus studentised residuals - what to use when?

This looks like a similar question and didn't get many responses. Omitting tests such as Cook's D, and just looking at residuals as a group, I am interested in how others use residuals when assessing goodness-of-fit. I use the raw residuals: in a…
Michelle
  • 3,900
35
votes
3 answers

How and why does Batch Normalization use moving averages to track the accuracy of the model as it trains?

I was reading the batch normalization (BN) paper (1) and didn't understand the need to use moving averages to track the accuracy of the model and even if I accepted that it was the right thing to do, I don't understand what they are doing…
35
votes
2 answers

If the Epanechnikov kernel is theoretically optimal when doing Kernel Density Estimation, why isn't it more commonly used?

I have read (for example, here) that the Epanechnikov kernel is optimal, at least in a theoretical sense, when doing kernel density estimation. If this is true, then why does the Gaussian show up so frequently as the default kernel, or in many…
35
votes
2 answers

How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?

I want to generate the plot described in the book ElemStatLearn "The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition" by Trevor Hastie & Robert Tibshirani& Jerome Friedman. The plot is: I am wondering how I…
1 2 3
99
100