3

This is a really simple problem I am having, yet for the life of me I can't find a solution searching around. In theory I can simply recode the data, but that is an extreme solution I would rather not use if I don't have to.

I am simply trying to do a logistic regression with an ordered factor as my predictor. For a toy data set, consider:

  radiation leukemia other total
1         0       13   378   391
2       1-9        5   200   205
3     10-49        5   151   156
4     50-99        3    47    50
5   100-199        4    31    35
6       200       18    33    51

I want to execute the following:

glm(cbind(leukemia,other)~radiation,data=leuk,family=binomial("logit"))

That is, leukemia are the "successes" and other are the "failures". Basically, trying to predict dose-response relationship between radiation and the proportional mortality rates for leukemia. However, this model is oversaturated:

Call:  glm(formula = cbind(leukemia, other) ~ radiation, family = binomial("logit"), 
    data = leuk)

Coefficients:
     (Intercept)      radiation1-9    radiation10-49  radiation100-199  
         -3.3699           -0.3189           -0.0379            1.3223  
    radiation200    radiation50-99  
          2.7638            0.6184  

Degrees of Freedom: 5 Total (i.e. Null);  0 Residual
Null Deviance:      54.35 
Residual Deviance: -3.331e-15   AIC: 33.67

I don't want each level of radiation as a factor to be its own predictor variable; that makes no sense, especially when you only have a small number of data points (note, this isn't actually the real data I am using, this is just a toy example that is similar). In any case, how do I force R to simply consider the factor radiation as a single variable with multiple levels? For example, if I do the following:

x<-c(0,1,2,3,4,5)
glm(cbind(leukemia,other)~x,data=leuk,family=binomial("logit"))

Call:  glm(formula = cbind(leukemia, other) ~ x, family = binomial("logit"), 
    data = leuk)

Coefficients:
(Intercept)            x  
    -3.9116       0.5731  

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      54.35 
Residual Deviance: 10.18        AIC: 35.84

This is more in line with what I want. But I am nervous about using that x variable in the regression for fear of changing the interpretation of the results. Similarly, I'd prefer to avoid an irritating system of dummy variables.

How do I go about doing this? Or is there a better workaround altogether for studying this type of relationship that I am not considering?

Ryan Simmons
  • 1,873

1 Answers1

1

It is already considered as a single (factor) variable with multiple levels where radiation 0 is the reference level. If you want to treat radiation as a numeric variable you need to first have or create one (e.g. mean radiation or so).

lyolya
  • 11