I have some categorical data set; I want to use these as predictor variables, like one is slope. And it categorized in to five classes as, < 10 deg, 10-20 deg, 20-30 deg, 30-40 deg, > 40 deg. I have taken first class as reference category (< 10 deg). Now I am facing problem to interpret the beta value of the reference category, because it’s not display on binary logistic regression. So what would be the value (beta), for reference category? Any suggestion regarding the problem would be appreciated.
-
1It depends on the software you use, but generally dummy coding is used, whereby the intercept includes the mean of the reference category and all other coefficients for your categorical variable reflect mean deviation from that value. If you have other predictors in your model, replace 'mean' by 'adjusted mean'. Here is an overview of coding systems with R: R Library: Contrast Coding Systems for categorical variables. – chl Jul 28 '12 at 09:33
2 Answers
Setting
Let $X$ be the categorical predictor and suppose it has 3 levels ($X = 1$, $X = 2$, and $X = 3$). Let the third level be the reference category.
Define $X_1$ and $X_2$ as follows:
$$ X_1 = \left\{ \begin{array}{ll} 1 & \textrm{if } X = 1 \\ 0 & \textrm{otherwise;} \end{array} \right. $$
$$ X_2 = \left\{ \begin{array}{ll} 1 & \textrm{if } X = 2 \\ 0 & \textrm{otherwise.} \end{array} \right. $$
If you know both $X_1$ and $X_2$ then you know $X$. In particular, if $X_1 = 0$ and $X_2 = 0$ then $X = 3$.
Logistic regression model
The model is written $$ \log \left( \frac{\pi_i}{1 - \pi_i} \right) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} $$ where $\pi_i$ denotes the probability of success of individual $i$ with covariate information $(x_{1i}, x_{2i})$.
- If individual $i$ falls in category $1$ then $x_{1i} = 1$, $x_{2i} = 0$ and $\log \left( \frac{\pi_i}{1 - \pi_i} \right) = \beta_0 + \beta_1$.
- If individual $i$ falls in category $2$ then $x_{1i} = 0$, $x_{2i} = 1$ and $\log \left( \frac{\pi_i}{1 - \pi_i} \right) = \beta_0 + \beta_2$.
- If individual $i$ falls in category $3$ then $x_{1i} = 0$, $x_{2i} = 0$ and $\log \left( \frac{\pi_i}{1 - \pi_i} \right) = \beta_0$.
odds ratio
Odds ratios are computed with respect to the reference category. For example, for 'category 1 vs category 3' we have
$$ \frac{\exp(\beta_0 + \beta_1)}{\exp(\beta_0)} = \exp(\beta_1). $$
- 29,907
- 21,851
this is standard for a single variable. the intercept is the log odds for the reference category and the dummy variables betas are the difference in log odds compared to the reference category. so an "insignificant" dummy variable means the logs odds arent significantly different from the reference category. this is the same as ordinary anova, just on the log odds scale instead of raw scale
- 24,971