Highest Voted Questions - Statistical Analysis Stack Exchange

46

votes

10 answers

How to plot trends properly

I am creating a graph to show trends in death rates (per 1000 ppl.) in different countries and the story that should come from the plot is that Germany (light blue line) is the only one whose trend is increasing after 1932. This is my first (basic)…

data-visualization

asked Jun 05 '18 at 07:49

PhDing

3,069

46

votes

2 answers

If only prediction is of interest, why use lasso over ridge?

On page 223 in An Introduction to Statistical Learning, the authors summarise the differences between ridge regression and lasso. They provide an example (Figure 6.9) of when "lasso tends to outperform ridge regression in terms of bias, variance,…

asked Mar 05 '18 at 10:19

Oliver Angelil

1,202
2
13
24

46

votes

1 answer

How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?

Specifically, I suppose I wonder about this statement: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…

asked Feb 07 '18 at 16:35

Christian Eriksson

573

46

votes

5 answers

How to perform two-sample t-tests in R by inputting sample statistics rather than the raw data?

Let's say we have the statistics given below gender mean sd n f 1.666667 0.5773503 3 m 4.500000 0.5773503 4 How do you perform a two-sample t-test (to see if there is a significant difference between the means of men and women in some variable)…

asked Jun 13 '12 at 16:15

Alby

2,223

46

votes

3 answers

Understanding input_shape parameter in LSTM with Keras

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of my data. I have as input a matrix of sequences of…

asked Apr 19 '17 at 05:52

sereizam

677

46

votes

4 answers

Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant overall F-statistic

I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations. Regression gives me the following $p$-values from the $t$-test for every slope coefficient:…

asked Mar 15 '12 at 19:56

Leo

2,634

46

votes

2 answers

Can you explain Parzen window (kernel) density estimation in layman's terms?

Parzen window density estimation is described as $$ p(x)=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{h^2} \phi \left(\frac{x_i - x}{h} \right) $$ where $n$ is number of elements in the vector, $x$ is a vector, $p(x)$ is a probability density of $x$, $h$ is…

asked Nov 03 '16 at 14:30

user366312

2,090

46

votes

3 answers

"Model failed to converge" warning in lmer()

With the following dataset, I wanted to see if the response (effect) changes with regard to sites, season, duration, and their interactions. Some online forums on statistics suggested me to go on with Linear Mixed-Effects Models, but the problem is…

asked Oct 24 '16 at 16:33

Syamkumar. R

553

46

votes

7 answers

How to deal with hierarchical / nested data in machine learning

I'll explain my problem with an example. Suppose you want to predict the income of an individual given some attributes: {Age, Gender, Country, Region, City}. You have a training dataset like so train <- data.frame(CountryID=c(1,1,1,1, 2,2,2,2,…

asked Jun 30 '16 at 00:36

Ben

1,864

46

votes

3 answers

What are the differences between hidden Markov models and neural networks?

I'm just getting my feet wet in statistics so I'm sorry if this question does not make sense. I have used Markov models to predict hidden states (unfair casinos, dice rolls, etc.) and neural networks to study users clicks on a search engine. Both…

asked Dec 31 '11 at 21:03

Lostsoul

703

46

votes

3 answers

Do Bayesian priors become irrelevant with large sample size?

When performing Bayesian inference, we operate by maximizing our likelihood function in combination with the priors we have about the parameters. Because the log-likelihood is more convenient, we effectively maximize $\sum \ln (\text{prior}) + \sum…

asked Mar 10 '16 at 14:42

pixels

639

46

votes

9 answers

Is it OK to remove outliers from data?

I looked for a way to remove outliers from a dataset and I found this question. In some of the comments and answers to this question, however, people mentioned that it is bad practice to remove outliers from the data. In my dataset I have several…

asked Mar 08 '16 at 12:54

Sininho

581

46

votes

9 answers

Approximate $e$ using Monte Carlo Simulation

I've been looking at Monte Carlo simulation recently, and have been using it to approximate constants such as $\pi$ (circle inside a rectangle, proportionate area). However, I'm unable to think of a corresponding method of approximating the value of…

asked Feb 04 '16 at 12:13

statisticnewbie12345

151
1
3
5

46

votes

6 answers

What does negative R-squared mean?

Let's say I have some data, and then I fit the data with a model (a non-linear regression). Then I calculate the R-squared ($R^2$). When R-squared is negative, what does that mean? Does that mean my model is bad? I know the range of $R^2$ can be…

asked Nov 24 '15 at 02:17

RockTheStar

12,907
34
71
96

46

votes

2 answers

Why is Laplace prior producing sparse solutions?

I was looking through the literature on regularization, and often see paragraphs that links L2 regulatization with Gaussian prior, and L1 with Laplace centered on zero. I know how these priors look like, but I don't understand, how it translates to,…

asked Oct 16 '15 at 08:10

Dmitry Smirnov

715

Most Popular