4

I know that logistic regression is a convex problem. Furthermore, from Lemma 1.17 in these optimization lecture notes, if a function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is convex, then the function over a restricted domain (as it were) should be convex. For example, if $f(x,y,z)$ is convex, then $g(x,y) = f(x,y,2)$ should be convex (assuming $f$ is defined at $z=2$), right?

However, I have been experimenting with my own implementation of logistic regression and have found that the loss function (ie opposite of log-likelihood) is not convex. All my code is at the bottom

I have created the following fake dataset which can be perfectly classified:

Plot of dataset

I defined a loss function as the negative log likelihood, scaled by $1/n$:

$$ L(\beta) = - \frac{1}{n}\sum_{i=1}^{n} y_i \log(p_i) + (1-y_{i})\log(1-p_i) $$

where

$$p_i = \dfrac{\exp(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i})}{1+\exp(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i})}$$

My optimization function gives the following solution (which seems to be right because it can perfectly classify the data),

$$\hat{\mathbf{\beta}} = (0, -20, 23)$$

If I plot the loss function at a fixed value of the third parameter $g(a,b) = L( (a, b, 23)$, it does not appear convex. I have tried this for multiple values of the third parameter.

Plot of restricted loss, 1 Plot of restricted loss, 2

What is going on here?

#
# Create dataset
#
x1 <- rep(seq(1, 10, length = 100),2)
x2 <- c(x1[1:100]+3 , x1[101:200]+5)
# Scale
x1 <- scale(x1)
x2 <- scale(x2)
# Store in data frame
dat <- data.frame(x1 = x1, x2 = x2,
    y = c(rep(0, 100), rep(1,100))
)
# Plot
plot(dat$x1, dat$x2, col = dat$y + 1, main = "Logistic Regression Dataset")
#
# Loss Function
#
sigmoid <- function(a){
    sapply(a, function(arg){
        if(arg > 18){return(1)}
        if(arg < -18){return(0)}
        return(1/(1+exp(-arg)))
    })
}
# A log function that is zero when its argument is zero, not -Inf
log0 <- function(a){
    sapply(a, function(arg){
        if(arg==0){return(0)}
        return(log(arg))
    })
}
loss <- function(y,x,w){
    # x a matrix with unity first col, w a vector
    if(all(x[,1] == 1) == FALSE){
        stop("first column of x must be unity")
    }
    # Compute vector of dot products
    sigmoid.dot <- sigmoid(x %*% w)
    # Compute elementwise loss
    li <- y*log0(sigmoid.dot)+(1-y)*log0(1-sigmoid.dot)
    #
    # Take negative average and return
    #
    return(-mean(li))
}
#
#OMITTING SOLLUTION CODE
# PROVE THAT IT WORKS:
X <- cbind(rep(1, nrow(dat)), dat$x1, dat$x2)
w <- c(0,-20, 23)
preds <- ifelse(sigmoid(X %*% w) > 0.5, 1, 0)
table(preds, dat$y)
#
# Plot the loss
#
lossPlot <- function(d){
    lossf <- function(a,b){
        w <- c(a, b, d)
        return(loss(dat$y, X, w))
    }
    a <- seq(-5, 5, length = 100)
    b <- seq(-30, 30, length = 100)
    z <- matrix(nrow = 100, ncol = 100)
    for(ii in 1:100){
        for(jj in 1:100){
            z[ii,jj] <- lossf(a[ii],b[jj])
        }
    }
    persp(a,b,z, phi = 45, theta = 45, 
        main = paste("Plot of restricted loss function with third coordinate", d)
        )
}
lossPlot(5)
lossPlot(23)

Thanks!

Sycorax
  • 90,934
RMurphy
  • 866
  • 7
  • 18
  • 2
    It looks like you're encountering some numeric issues when the arguments to $exp$ get very large. – Matthew Drury Aug 02 '17 at 23:10
  • 3
  • As others have pointed out, the approximation in R\[-18, 18] leads to some ill condition; 2. log(0) should be -INF, but you returned 0 instead. If you want to smooth this behaviour, you should return a large negative number. Returning 0 corrupted the convexity of the log function around 0, which corrupted the loss of LR subsequently.
  • – soloice Mar 31 '19 at 18:29