Most Popular

1500 questions
41
votes
3 answers

How to present results of a Lasso using glmnet?

I would like to find predictors for a continuous dependent variable out of a set of 30 independent variables. I am using Lasso regression as implemented in the glmnet package in R. Here is some dummy code: # generate a dummy dataset with 30…
jokel
  • 2,763
41
votes
16 answers

What is the intuition behind the formula for conditional probability?

The formula for the conditional probability of $\text{A}$ happening given that $\text{B}$ has happened is:$$ P\left(\text{A}~\middle|~\text{B}\right)=\frac{P\left(\text{A} \cap \text{B}\right)}{P\left(\text{B}\right)}. $$ My textbook explains the…
WorldGov
  • 755
  • 1
  • 9
  • 20
41
votes
8 answers

How to test hypothesis of no group differences?

Imagine you have a study with two groups (e.g., males and females) looking at a numeric dependent variable (e.g., intelligence test scores) and you have the hypothesis that there are no group differences. Question: What is a good way to test…
Jeromy Anglim
  • 44,984
41
votes
10 answers

Why is 600 out of 1000 more convincing than 6 out of 10?

Look at this excerpt from "The study skills handbook", Palgrave, 2012, by Stella Cottrell, page 155: Percentages Notice when percentages are given. Suppose, instead, the statement above read: 60% of people preferred oranges; 40% said they…
Juya
  • 663
41
votes
5 answers

What's considered a good log loss?

I'm trying to better understand log loss and how it works but one thing I can't seem to find is putting the log loss number into some sort of context. If my model has a log loss of 0.5, is that good? What's considered a good and bad score? How do…
user1923975
  • 575
  • 1
  • 5
  • 9
41
votes
4 answers

How does one measure the non-uniformity of a distribution?

I'm trying to come up with a metric for measuring non-uniformity of a distribution for an experiment I'm running. I have a random variable that should be uniformly distributed in most cases, and I'd like to be able to identify (and possibly measure…
JJC
  • 513
41
votes
6 answers

How did scientists figure out the shape of the normal distribution probability density function?

This is probably an amateur question, but I am interested in how did the scientists come up with the shape of the normal distribution probability density function? Basically what bugs me is that for someone it would perhaps be more intuitive that…
bonehead
  • 556
41
votes
2 answers

How to draw valid conclusions from "big data"?

"Big data" is everywhere in the media. Everybody says that "big data" is the big thing for 2012, e.g. KDNuggets poll on hot topics for 2012. However, I have deep concerns here. With big data, everybody seems to be happy just to get anything out. But…
41
votes
5 answers

How to create an arbitrary covariance matrix

For example, in R, the MASS::mvrnorm() function is useful for generating data to demonstrate various things in statistics. It takes a mandatory Sigma argument which is a symmetric matrix specifying the covariance matrix of the variables. How would…
rsl
  • 1,075
  • 2
  • 11
  • 15
41
votes
1 answer

Is there Factor analysis or PCA for ordinal or binary data?

I have completed the principal component analysis (PCA), exploratory factor analysis (EFA), and confirmatory factor analysis (CFA), treating data with likert scale (5-level responses: none, a little, some,..) as a continuous variable. Then, using…
user116948
  • 453
  • 1
  • 5
  • 6
41
votes
3 answers

How do bottleneck architectures work in neural networks?

We define a bottleneck architecture as the type found in the ResNet paper where [two 3x3 conv layers] are replaced by [one 1x1 conv, one 3x3 conv, and another 1x1 conv layer]. I understand that the 1x1 conv layers are used as a form of dimension…
41
votes
8 answers

Under what conditions should one use multilevel/hierarchical analysis?

Under which conditions should someone consider using multilevel/hierarchical analysis as opposed to more basic/traditional analyses (e.g., ANOVA, OLS regression, etc.)? Are there any situations in which this could be considered mandatory? Are there…
Patrick
  • 743
41
votes
7 answers

Is there an accepted definition for the median of a sample on the plane, or higher ordered spaces?

If so, what? If not, why not? For a sample on the line, the median minimizes the total absolute deviation. It would seem natural to extend the definition to R2, etc., but I've never seen it. But then, I've been out in left field for a long time.
phv3773
  • 511
41
votes
2 answers

Is there a reliable nonparametric confidence interval for the mean of a skewed distribution?

Very skewed distributions such as the log-normal do not result in accurate bootstrap confidence intervals. Here is an example showing that the left and right tail areas are far from the ideal 0.025 no matter which bootstrap method you try in…
Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
41
votes
4 answers

Measures of similarity or distance between two covariance matrices

Are there any measures of similarity or distance between two symmetric covariance matrices (both having the same dimensions)? I am thinking here of analogues to KL divergence of two probability distributions or the Euclidean distance between vectors…