Questions tagged [generalized-linear-model]

A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "general linear model" which extends the ordinary linear model to general covariance structure and multivariate response.)

A generalized linear model extends regression models by allowing a more general (conditional) distribution for the observations, a variance function related to the mean, and by allowing non-linear relationship between the mean and the linear predictor, $X\beta$.

A generalized linear model consists of three components:

  1. Systematic part: $\eta_i = X_i'\beta$ . This is the linear predictor.
  2. Random part: $Y_1, Y_2, ..., Y_n$ that are independent random variables where $$ Y_i \sim D(\mu_i = EY_i)$$ where $D$ is an exponential family distribution. More generally we can have an additional parameter, the overdispersion parameter $\phi$ which controls the dispersion in $Y_i$
  3. Link function: an invertible function $g$, such that $\eta_i = g(\mu_i)$, or equivalently, $E(Y_i) = \mu_i = g^{-1}(\eta_i) = g^{-1}(X_i'\beta)$

The similar term "general linear model" is often confused with generalized linear models (both are typically abbreviated GLM). A general linear model is the standard multiple regression setting $Y = X\beta + \varepsilon$ (for a "design matrix" $X$, parameters $\beta$, and "error term" $\varepsilon$). Use the or tags for such cases (see discussion).

4578 questions
16
votes
3 answers

When to use GLM instead of LM?

When to use a generalized linear model over linear model? I know that generalized linear model allows for example the errors to have some other distribution than normal, but why is one concerned with the distributions of the errors? Like why are…
mavavilj
  • 4,109
10
votes
1 answer

Dispersion parameter for Gamma family

I have ran a glm in R, and near the bottom of the summary() output, it states: (Dispersion parameter for Gamma family taken to be 1.680014) What does this mean/represent?
8
votes
1 answer

GLM Gaussian vs GLM Binomial vs log-link GLM Gaussian

I am trying to do a study of deaths due to malaria in order to find the best way to predict how dangerous this disease is. I don't have a strong background in statistics, I am an auto-learner building my knowledge using online courses. First, I…
user3378649
  • 1,157
8
votes
1 answer

Understanding of GLM

I have scoured around, reading posts on Cross Validated (Difference between logit and probit models) and also looking at references including Dobson and McCullagh and Nelder, e.g. http://www.statsci.org/glm/books.html so I am aware that this topic…
t-student
  • 800
7
votes
2 answers

Starting coefficient vector for GLM

I would like to know how R chooses its starting coefficient vector for a GLM when its start argument is left blank and defaults to NULL. For my personal implementation of a GLM, I have simply initialized $ \boldsymbol \beta_0 $ to be all $1$s.…
Jon Claus
  • 605
6
votes
2 answers

Confusion related to generalized linear model example

I was reading this article related to generalized linear models: http://en.wikipedia.org/wiki/Generalized_linear_models. It gave a specific example Ordinary linear regression predicts the expected value of a given unknown quantity (the response…
6
votes
1 answer

Why is generalized linear model (GLM) a semi-parametric model?

As we all know , the GLM has the structure: $G(EY)=X^{T}\beta$, in which $G(.)$ is a known link function. What confuse me is that some people say that it's a semiparametric model. But in my opinion, it's a parametric model, because there is no…
5
votes
1 answer

Why does the linear test statistic of GLM follow F-distribution?

As a MATLAB user, I have been using coefTest to perform linear hypothesis testing. For example in $y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3$, if I want to test if $\beta_1=\beta_2$, then I can simply use a linear contrast…
5
votes
2 answers

Post-hoc after GLM

I am running a GLM, using the function glm.nb (pscl package) trying to figure how what could influence a particular trait in several locations and years. The output as follow (with slight modification and removing the things beyond this question) …
user36491
  • 405
4
votes
2 answers

Why my residual-fitted plot looks like this?

I'm using a glm poisson regression in R, and I did a model diagnostics after my model fitting, but the residual distribution is so wierd.
geeh
  • 41
4
votes
1 answer

How to interpret coefficient standard errors for logistic regression

The coefficients estimated for logistic regression are in log odds, and I understand it is common -- at least when interpreting the output -- to convert the log odds to odds so they're more easily understandable. When reporting results for logistic…
4
votes
1 answer

How to choose the family in Generalized Linear Model in R

I would like to know how to choose the family in generalized linear models in R. Roughly, I have learned that family=binomial or family=poisson should be used if dependent variable (y) is binary or count data. How about others here? Especially, in…
B Sann
  • 41
  • 3
4
votes
2 answers

Consequence of choosing wrong functional of covariates in GLM/GAM

I'm modelling the mood of teenagers in a really big school. Response is 'good mood' and 'bad mood'. One of the variables that is used to explain the students mood is "Area of residence". Explanatory variable "Area of residence" has 5 categories:…
Erosennin
  • 1,734
4
votes
1 answer

Deviance and saturated models

Deviance is defined as $$ -2(\log L_0 - \log L_s ), $$ where $L_s$ is the log-likelihood of the saturated model. One definition of a saturated model is "a model with a parameter for every observation so that the data are fitted exactly". How can I…
Ievgen
  • 231
4
votes
1 answer

What family is used in glm for Continuous predictor vs Continuous outcome

I am going to use the glm to estimate nutrient concentration as a function of river flow. My nutrient concentration are not normally distributed and variance is not constant. So, I would like try GLM but not sure what family I should use. Data are…
Farshad
  • 43
1
2 3 4 5 6