Most Popular

1500 questions
71
votes
2 answers

What does the inverse of covariance matrix say about data? (Intuitively)

I'm curious about the nature of $\Sigma^{-1}$. Can anybody tell something intuitive about "What does $\Sigma^{-1}$ say about data?" Edit: Thanks for replies After taking some great courses, I'd like to add some points: It is measure of information,…
Arya
  • 973
71
votes
5 answers

How to derive the ridge regression solution?

I am having some issues with the derivation of the solution for ridge regression. I know the regression solution without the regularization term is given by: $$\beta = (X^\top X)^{-1}X^\top y.$$ But after adding the L2 term $\lambda\|\beta\|_2^2$ to…
user34790
  • 6,757
  • 10
  • 46
  • 69
71
votes
2 answers

Removing duplicated rows data frame in R

How can I remove duplicate rows from this example data frame? A 1 A 1 A 2 B 4 B 1 B 1 C 2 C 2 I would like to remove the duplicates based on both the columns: A 1 A 2 B 4 B 1 C 2 Order is not important.
Jana
  • 969
71
votes
2 answers

What is the difference between a partial likelihood, profile likelihood and marginal likelihood?

I see these terms being used and I keep getting them mixed up. Is there a simple explanation of the differences between them?
Rob Hyndman
  • 56,782
71
votes
8 answers

Regression with multiple dependent variables?

Is it possible to have a (multiple) regression equation with two or more dependent variables? Sure, you could run two separate regression equations, one for each DV, but that doesn't seem like it would capture any relationship between the two DVs?
Jeff
  • 3,927
71
votes
19 answers

What are some valuable Statistical Analysis open source projects?

What are some valuable Statistical Analysis open source projects available right now? Edit: as pointed out by Sharpie, valuable could mean helping you get things done faster or more cheaply.
grokus
  • 233
71
votes
4 answers

What is the definition of a "feature map" (aka "activation map") in a convolutional neural network?

 Intro Background Within a convolutional neural network, we usually have a general structure / flow that looks like this: input image (i.e. a 2D vector x) (1st Convolutional layer (Conv1) starts here...) convolve a set of filters (w1) along the…
Atlas7
  • 813
  • 1
  • 7
  • 7
71
votes
3 answers

Neural Network: For Binary Classification use 1 or 2 output neurons?

Assume I want to do binary classification (something belongs to class A or class B). There are some possibilities to do this in the output layer of a neural network: Use 1 output node. Output 0 (<0.5) is considered class A and 1 (>=0.5) is…
robert
  • 1,111
71
votes
5 answers

What problem do shrinkage methods solve?

The holiday season has given me the opportunity to curl up next to the fire with The Elements of Statistical Learning. Coming from a (frequentist) econometrics perspective, I'm having trouble grasping the uses of shrinkage methods like ridge…
Charlie
  • 14,062
  • 5
  • 44
  • 72
71
votes
6 answers

Test if two binomial distributions are statistically different from each other

I have three groups of data, each with a binomial distribution (i.e. each group has elements that are either success or failure). I do not have a predicted probability of success, but instead can only rely on the success rate of each as an…
Scott
  • 1,030
71
votes
3 answers

Interpreting Residual and Null Deviance in GLM R

How to interpret the Null and Residual Deviance in GLM in R? Like, we say that smaller AIC is better. Is there any similar and quick interpretation for the deviances also? Null deviance: 1146.1 on 1077 degrees of freedom Residual deviance: 4589.4…
Anjali
  • 1,011
  • 3
  • 11
  • 10
70
votes
7 answers

Why doesn't Random Forest handle missing values in predictors?

What are theoretical reasons to not handle missing values? Gradient boosting machines, regression trees handle missing values. Why doesn't Random Forest do that?
70
votes
8 answers

What are good basic statistics to use for ordinal data?

I have some ordinal data gained from survey questions. In my case they are Likert style responses (Strongly Disagree-Disagree-Neutral-Agree-Strongly Agree). In my data they are coded as 1-5. I don't think means would mean much here, so what basic…
PaulHurleyuk
  • 1,569
70
votes
6 answers

Standard errors for lasso prediction using R

I'm trying to use a LASSO model for prediction, and I need to estimate standard errors. Surely someone has already written a package to do this. But as far as I can see, none of the packages on CRAN that do predictions using a LASSO will return…
Rob Hyndman
  • 56,782
70
votes
6 answers

Is it important to scale data before clustering?

I found this tutorial, which suggests that you should run the scale function on features before clustering (I believe that it converts data to z-scores). I'm wondering whether that is necessary. I'm asking mostly because there's a nice elbow point…
Jeremy
  • 1,429