Most Popular

1500 questions
44
votes
2 answers

How to derive the standard error of linear regression coefficient

For this univariate linear regression model $$y_i = \beta_0 + \beta_1x_i+\epsilon_i$$ given data set $D=\{(x_1,y_1),...,(x_n,y_n)\}$, the coefficient estimates are $$\hat\beta_1=\frac{\sum_ix_iy_i-n\bar x\bar y}{n\bar x^2-\sum_ix_i^2}$$…
avocado
  • 3,581
  • 6
  • 35
  • 49
44
votes
5 answers

What is the significance of logistic regression coefficients?

I am currently reading a paper concerning voting location and voting preference in the 2000 and 2004 election. In it, there is a chart which displays logistic regression coefficients. From courses years back and a little reading up, I understand…
44
votes
4 answers

How can SVM 'find' an infinite feature space where linear separation is always possible?

What is the intuition behind the fact that an SVM with a Gaussian Kernel has infinite dimensional feature space?
user36162
  • 581
44
votes
9 answers

Why use vector error correction model?

I am confused about the Vector Error Correction Model (VECM). Technical background: VECM offers a possibility to apply Vector Autoregressive Model (VAR) to integrated multivariate time series. In the textbooks they name some problems in applying a…
DatamineR
  • 1,627
  • 4
  • 19
  • 26
44
votes
2 answers

How to interpret glmnet?

I am trying to fit a multivariate linear regression model with approximately 60 predictor variables and 30 observations, so I am using the glmnet package for regularized regression because p>n. I have been going through documentation and other…
Alice
  • 885
44
votes
1 answer

Manually calculated $R^2$ doesn't match up with randomForest() $R^2$ for testing new data

I know this is a fairly specific R question, but I may be thinking about proportion variance explained, $R^2$, incorrectly. Here goes. I'm trying to use the R package randomForest. I have some training data and testing data. When I fit a random…
44
votes
9 answers

Tiny (real) datasets for giving examples in class?

When teaching an introductory level class, the teachers I know tend to invent some numbers and a story in order to exemplify the method they are teaching. What I would prefer is to tell a real story with real numbers. However, these stories needs…
Tal Galili
  • 21,541
44
votes
4 answers

Polynomial regression using scikit-learn

I am trying to use scikit-learn for polynomial regression. From what I read polynomial regression is a special case of linear regression. I was hopping that maybe one of scikit's generalized linear models can be parameterised to fit higher order…
44
votes
5 answers

How to derive the least square estimator for multiple linear regression?

In the simple linear regression case $y=\beta_0+\beta_1x$, you can derive the least square estimator $\hat\beta_1=\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum(x_i-\bar x)^2}$ such that you don't have to know $\hat\beta_0$ to estimate…
Saber CN
  • 819
44
votes
2 answers

Evidence for man-made global warming hits 'gold standard': how did they do this?

This message in a Reuter's article from 25.02.2019 is currently all over the news: Evidence for man-made global warming hits 'gold standard' [Scientists] said confidence that human activities were raising the heat at the Earth’s surface had reached…
44
votes
2 answers

Simulation of logistic regression power analysis - designed experiments

This question is in response to an answer given by @Greg Snow in regards to a question I asked concerning power analysis with logistic regression and SAS Proc GLMPOWER. If I am designing an experiment and will analze the results in a factorial…
B_Miner
  • 8,630
44
votes
6 answers

How to quasi match two vectors of strings (in R)?

I am not sure how this should be termed, so please correct me if you know a better term. I've got two lists. One of 55 items (e.g: a vector of strings), the other of 92. The item names are similar but not identical. I wish to find the best…
Tal Galili
  • 21,541
44
votes
6 answers

Why do I get a 100% accuracy decision tree?

I'm getting a 100% accuracy for my decision tree. What am I doing wrong? This is my code: import pandas as pd import json import numpy as np import sklearn import matplotlib.pyplot as plt data =…
Nadjla
  • 451
44
votes
1 answer

Existence of the moment generating function and variance

Can a distribution with finite mean and infinite variance have a moment generating function? What about a distribution with finite mean and finite variance but infinite higher moments?
Mgf
  • 441
44
votes
3 answers

Training loss increases with time

I am training a model (Recurrent Neural Network) to classify 4 types of sequences. As I run my training I see the training loss going down until the point where I correctly classify over 90% of the samples in my training batches. However a couple of…
dins2018
  • 443