Highest Voted Questions - Statistical Analysis Stack Exchange

35

votes

3 answers

What's in a name: Precision (inverse of variance)

Intuitively, the mean is just the average of observations. The variance is how much these observations vary from the mean. I would like to know why the inverse of the variance is known as the precision. What intuition can we make from this? And why…

asked May 08 '16 at 06:39

cgo

9,107

35

votes

3 answers

Should final (production ready) model be trained on complete data or just on training set?

Suppose I trained several models on training set, choose best one using cross validation set and measured performance on test set. So now I have one final best model. Should I retrain it on my all available data or ship solution trained only on…

asked Nov 29 '15 at 11:40

Yurii

1,964

35

votes

1 answer

Difference between Hidden Markov models and Particle Filter (and Kalman Filter)

Here is my old question I would like to ask if someone knows the difference (if there is any difference) between Hidden Markov models (HMM) and Particle Filter (PF), and as a consequence Kalman Filter, or under which circumstances we use which…

asked Nov 23 '15 at 10:32

user5584748

451

35

votes

5 answers

Fisher's Exact Test in contingency tables larger than 2x2

I was taught to only apply Fisher's Exact Test in contingency tables that were 2x2. Questions: Did Fisher himself ever envision this test to be used in tables larger than 2x2 (I am aware of the tale of him devising the test while trying to guess…

asked Aug 17 '10 at 23:42

pmgjones

5,773
8
38
36

35

votes

4 answers

Can anyone explain conjugate priors in simplest possible terms?

I have been trying to understand the idea of conjugate priors in Bayesian statistics for a while but I simply don't get it. Can anyone explain the idea in the simplest possible terms, perhaps using the "Gaussian prior" as an example?

asked Oct 13 '15 at 01:44

Jenna Maiz

899

35

votes

3 answers

Negative binomial distribution vs binomial distribution

What is the difference between the negative binomial distribution and the binomial distribution? I tried reading online, and I found that the negative binomial distribution is used when data points are discrete, but I think even the binomial…

asked Oct 08 '15 at 10:53

alily

616

35

votes

7 answers

Why is it bad to teach students that p-values are the probability that findings are due to chance?

Can someone please offer a nice succinct explanation why it is not a good idea to teach students that a p-value is the prob(their findings are due to [random] chance). My understanding is that a p-value is the prob(getting more extreme data | null…

asked Oct 13 '11 at 02:55

Patrick

1,571

35

votes

3 answers

How to find confidence intervals for ratings?

Evan Miller's "How Not to Sort by Average Rating" proposes using the lower bound of a confidence interval to get a sensible aggregate "score" for rated items. However, it's working with a Bernoulli model: ratings are either thumbs up or thumbs…

asked Sep 23 '11 at 16:41

Peter Taylor

403

35

votes

3 answers

Metrics for evaluating ranking algorithms

I'm interested in looking at several different metrics for ranking algorithms - there are a few listed on the Learning to Rank wikipedia page, including: • Mean average precision (MAP); • DCG and NDCG; • Precision@n, NDCG@n, where "@n" denotes that…

asked Jul 02 '15 at 15:30

anthr

937

35

votes

4 answers

Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that when I make predictions using all these 3 models and then make a table of…

asked Feb 24 '15 at 14:29

Maksim Khaitovich

668

35

votes

7 answers

Inference vs. estimation?

What are the differences between "inference" and "estimation" under the context of machine learning? As a newbie, I feel that we infer random variables and estimate the model parameters. Is my this understanding right? If not, what are the…

asked Jan 01 '15 at 08:12

Sibbs Gambling

2,609

35

votes

2 answers

Is there a boxplot variant for Poisson distributed data?

I'd like to know if there is a boxplot variant adapted to Poisson distributed data (or possibly other distributions)? With a Gaussian distribution, whiskers placed at L = Q1 - 1.5 IQR and U = Q3 + 1.5 IQR, the boxplot has the property that there…

asked Jul 15 '11 at 11:19

caas

535

35

votes

2 answers

What does an Added Variable Plot (Partial Regression Plot) explain in a multiple regression?

I have a model of Movies dataset and I used the regression: model <- lm(imdbVotes ~ imdbRating + tomatoRating + tomatoUserReviews+ I(genre1 ** 3.0) +I(genre2 ** 2.0)+I(genre3 ** 1.0), data = movies) library(ggplot2) res <- qplot(fitted(model),…

asked Nov 26 '14 at 12:31

Abhishek Choudhary

493
1
4
8

35

votes

5 answers

How to generate a large full-rank random correlation matrix with some strong correlations present?

I would like to generate a random correlation matrix $\mathbf C$ of $n \times n$ size such that there are some moderately strong correlations present: square real symmetric matrix of $n \times n$ size, with e.g. $n=100$; positive-definite, i.e.…

asked Nov 18 '14 at 15:35

amoeba

104,745

35

votes

13 answers

What statistical blogs would you recommend?

What statistical research blogs would you recommend, and why?

references

asked Jul 19 '10 at 21:00

csgillespie

13,029

Most Popular