2

I tried to plot the results of an ordered logistic regression analysis by calculating the probabilities of endorsing every answer category of the dependent variable (6-point Likert scale, ranging from "1" to "6"). However, I've received strange probabilities when I calculated the probabilities based on this formula: $\rm{Pr}(y_i \le k|X_i) = \rm{logit}^{-1}(X_i\beta)$.

Below you see how exactly I tried to calculate the probabilities and plot the results of the ordered logistic regression model (m2) that I fitted using the polr function (MASS package). The probabilities (probLALR) that I calculated and used to plot an "expected mean score" are puzzling as the expcected mean score in the plot increases along the RIV.st continuum while the coefficient for RIV.st is negative (-0.1636). I would have expected that the expected mean score decreases due to the negative main effect of RIV.st and the irrelevance of the interaction terms for the low admiration and low rivalry condition (LALR) of the current 2 by 2 design (first factor = f.adm; second factor = f.riv; dummy coding 0 and 1).

Any idea of how to make sense of the found pattern? Is this the right way to calculate the probabilities? The way I used the intercepts in the formula to calculate the probabilities might be problematic (cf., Negative coefficient in ordered logistic regression).

m2 <- polr(short.f ~ 1 + f.adm*f.riv + f.adm*RIV.st + f.riv*RIV.st, data=sampleNS)

# f.adm  = dummy (first factor of 2 by 2 design);
# f.riv  = dummy (second factor of 2 by 2 design);
# RIV.st = continuous predictor (standardized)
summary(m2)
Coefficients:
                Value Std. Error t value
f.adm1         1.0203    0.14959  6.8203
f.riv1        -0.8611    0.14535 -5.9240
RIV.st        -0.1636    0.09398 -1.7403
f.adm1:f.riv1 -1.2793    0.20759 -6.1625
f.adm1:RIV.st  0.0390    0.10584  0.3685
f.riv1:RIV.st  0.6989    0.10759  6.4953

Intercepts:
    Value    Std. Error t value 
1|2  -2.6563   0.1389   -19.1278
2|3  -1.2139   0.1136   -10.6898
3|4  -0.3598   0.1069    -3.3660
4|5   0.9861   0.1121     8.7967
5|6   3.1997   0.1720    18.6008

Here you see how I tried to calculate the probabilities (probLALR) for 1 of the 4 conditions of the 2 by 2 design:

inv.logit  <- function(x){ return(exp(x)/(1+exp(x))) }
Pred       <- seq(-3, 3, by=0.01)
b = c(-2.6563,-1.2139,-0.3598,0.9861,3.1997) # intercepts of model m2
a = c(1.0203,-0.8611,-0.1636,-1.2793,0.0390,0.6989) # coefficients of m2
probLALR   <- data.frame(matrix(NA,601,5))
for (k in 1:5){ 
    probLALR[,k] <- inv.logit(b[k] + a[1]*0 + a[2]*0 + 
                               a[3]*Pred  + a[4]*0*0 + 
                               a[5]*Pred*0 + a[6]*Pred*0)
}

plot(Pred,probLALR[,1],type="l",ylim=c(0,1))   # p(1)
lines(Pred,probLALR[,2],col="red")             # p(1 or 2)
lines(Pred,probLALR[,3],col="green")           # P(1 or 2 or 3)
lines(Pred,probLALR[,4],col="orange")          # P(1 or 2 or 3 or 4)
lines(Pred,probLALR[,5],col="orange")          # P(1 or 2 or 3 or 4 or 5)

# option response functions:

orc = matrix(NA,601,6)
orc[,6] = 1-probLALR[,5]        # prob of 6
orc[,5]= probLALR[,5]-probLALR[,4]  # prob of 5
orc[,4]= probLALR[,4]-probLALR[,3]  # prob of 4
orc[,3]= probLALR[,3]-probLALR[,2]  # prob of 3
orc[,2]= probLALR[,2]-probLALR[,1]  # prob of 2
orc[,1]= probLALR[,1]           # prob of 1


plot(Pred,orc[,1],type="l",ylim=c(0,1))   # p(1)
lines(Pred,orc[,2],col="red")             # p(2)
lines(Pred,orc[,3],col="green")           # P(3)
lines(Pred,orc[,4],col="orange")          # P(4)
lines(Pred,orc[,5],col="purple")          # P(5)
lines(Pred,orc[,6],col="purple")          # P(6)

# mean score

mean = orc[,1]*1+orc[,2]*2+orc[,3]*3+orc[,4]*4+orc[,5]*5+orc[,6]*6
plot(Pred,mean,type="l",xlab="RIV.st",ylab="expected mean score",ylim=c(1,6))  

1 Answers1

1

Thanks to Wilco Emons for the following solution to the problem:

In polr the cumulative link model is parameterized a bit different than in Agresti’s book that is referred to. The problem can be easily solved by changing the code line:

probLALR[,k] <- inv.logit(b[k] +  a[1]*0 + a[2]*0 + a[3]*Pred + a[4]*0*0 + 
                                  a[5]*Pred*0 + a[6]*Pred*0                ) 

into

probLALR[,k] <- inv.logit(b[k] - (a[1]*0 + a[2]*0 + a[3]*Pred + a[4]*0*0 + 
                                  a[5]*Pred*0 + a[6]*Pred*0)               ) 

Thanks also to Achim Zeileis for his suggestion to use predict(m2, type="prob")! Below you will find a way to calculate the probabilities by the means of the predict() function:

Pred             <- seq(-3, 3, by=0.01)
PRED.LALR        <- data.frame(rep(NA,601))
PRED.LALR$f.adm  <- as.factor(rep(0,601))
PRED.LALR$f.riv  <- as.factor(rep(0,601))
PRED.LALR$RIV.st <- Pred
prob.LALR        <- predict(m2,PRED.LALR,type="prob")

scoreLALR        <- prob.LALR[,1]*1 + prob.LALR[,2]*2 + prob.LALR[,3]*3 + 
                    prob.LALR[,4]*4 + prob.LALR[,5]*5 + prob.LALR[,6]*6
plot(Pred, scoreLALR, col="green", ylim=c(1,6))