Most Popular

1500 questions
46
votes
10 answers

How to plot trends properly

I am creating a graph to show trends in death rates (per 1000 ppl.) in different countries and the story that should come from the plot is that Germany (light blue line) is the only one whose trend is increasing after 1932. This is my first (basic)…
PhDing
  • 3,069
46
votes
2 answers

If only prediction is of interest, why use lasso over ridge?

On page 223 in An Introduction to Statistical Learning, the authors summarise the differences between ridge regression and lasso. They provide an example (Figure 6.9) of when "lasso tends to outperform ridge regression in terms of bias, variance,…
Oliver Angelil
  • 1,202
  • 2
  • 13
  • 24
46
votes
1 answer

How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?

Specifically, I suppose I wonder about this statement: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…
46
votes
5 answers

How to perform two-sample t-tests in R by inputting sample statistics rather than the raw data?

Let's say we have the statistics given below gender mean sd n f 1.666667 0.5773503 3 m 4.500000 0.5773503 4 How do you perform a two-sample t-test (to see if there is a significant difference between the means of men and women in some variable)…
Alby
  • 2,223
46
votes
3 answers

Understanding input_shape parameter in LSTM with Keras

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of my data. I have as input a matrix of sequences of…
sereizam
  • 677
46
votes
4 answers

Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant overall F-statistic

I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations. Regression gives me the following $p$-values from the $t$-test for every slope coefficient:…
Leo
  • 2,634
46
votes
2 answers

Can you explain Parzen window (kernel) density estimation in layman's terms?

Parzen window density estimation is described as $$ p(x)=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{h^2} \phi \left(\frac{x_i - x}{h} \right) $$ where $n$ is number of elements in the vector, $x$ is a vector, $p(x)$ is a probability density of $x$, $h$ is…
user366312
  • 2,090
46
votes
3 answers

"Model failed to converge" warning in lmer()

With the following dataset, I wanted to see if the response (effect) changes with regard to sites, season, duration, and their interactions. Some online forums on statistics suggested me to go on with Linear Mixed-Effects Models, but the problem is…
46
votes
7 answers

How to deal with hierarchical / nested data in machine learning

I'll explain my problem with an example. Suppose you want to predict the income of an individual given some attributes: {Age, Gender, Country, Region, City}. You have a training dataset like so train <- data.frame(CountryID=c(1,1,1,1, 2,2,2,2,…
Ben
  • 1,864
46
votes
3 answers

What are the differences between hidden Markov models and neural networks?

I'm just getting my feet wet in statistics so I'm sorry if this question does not make sense. I have used Markov models to predict hidden states (unfair casinos, dice rolls, etc.) and neural networks to study users clicks on a search engine. Both…
Lostsoul
  • 703
46
votes
3 answers

Do Bayesian priors become irrelevant with large sample size?

When performing Bayesian inference, we operate by maximizing our likelihood function in combination with the priors we have about the parameters. Because the log-likelihood is more convenient, we effectively maximize $\sum \ln (\text{prior}) + \sum…
pixels
  • 639
46
votes
9 answers

Is it OK to remove outliers from data?

I looked for a way to remove outliers from a dataset and I found this question. In some of the comments and answers to this question, however, people mentioned that it is bad practice to remove outliers from the data. In my dataset I have several…
Sininho
  • 581
46
votes
9 answers

Approximate $e$ using Monte Carlo Simulation

I've been looking at Monte Carlo simulation recently, and have been using it to approximate constants such as $\pi$ (circle inside a rectangle, proportionate area). However, I'm unable to think of a corresponding method of approximating the value of…
46
votes
6 answers

What does negative R-squared mean?

Let's say I have some data, and then I fit the data with a model (a non-linear regression). Then I calculate the R-squared ($R^2$). When R-squared is negative, what does that mean? Does that mean my model is bad? I know the range of $R^2$ can be…
RockTheStar
  • 12,907
  • 34
  • 71
  • 96
46
votes
2 answers

Why is Laplace prior producing sparse solutions?

I was looking through the literature on regularization, and often see paragraphs that links L2 regulatization with Gaussian prior, and L1 with Laplace centered on zero. I know how these priors look like, but I don't understand, how it translates to,…