How to do LASSO regression with a dependent variable that is continuous between 0 and 1

Question

I am trying to do a LASSO regression on some data. However, my dependent variable is between 0 and 1. How do I go about this? Do I just apply a sigmoid function to the regression output?

This will surely force the outcome to the 0-1 range, but I am not sure of the technical implications.

how would you model this data if you weren't interested in penalization during estimation? Answering this question should send you in the right direction. — user795305, Jul 28 '17 at 14:49
I am not sure either. So even if I was to do a straight linear regression, my question would still stand. — Minaj, Jul 28 '17 at 14:58
If $x$ can take any value but $y$ is bounded between 0 and 1, then $y$ isn't a linear function of $x$. You've specified two properties of the function (that $y \in (0,1)$, and something about it being continuous). There are all kinds of crazy looking nonlinear functions that satisfy these properties. To get to the point of fitting a model, you'd have to be more explicit about the type of function you're looking for. — user20160, Jul 28 '17 at 15:35
Is it a continuous proportion or a count proportion you are modelling? — usεr11852, Jul 29 '17 at 08:04

score 5 · Accepted Answer · answered Jul 28 '17 at 15:37

Since the response variable is between 0 to 1, i.e., you should perform a beta regression. The package 'gamlss' allows you to do that in addition to fit your model using Lasso.

library(betareg)
data(GasolineYield)
library(gamlss)

X <- with(GasolineYield, cbind(gravity,pressure,temp10,temp,batch))
# standarise data 1-------------------------------------------------------------
sX <- scale(X)
# ridge
m1 <- gamlss(yield~ri(sX), data = GasolineYield)
# lasso
m2 <- gamlss(yield~ri(sX, Lp=1), data = GasolineYield)
# best subset
m3 <- gamlss(yield~ri(sX, Lp=0), data = GasolineYield)

# summary
summary(m1)
summary(m2)
summary(m3)

# plotting the coefficients
plot(getSmo(m1))
plot(getSmo(m2))
plot(getSmo(m3))

There are some variations for beta regression. Take a look at the GAMLSS Manual.

If you apply a logit transformation (your sigmoid function if I understood correctly) then fit a linear model, Ferrari and Cribari-Neto (2004) states that you would make your residuals asymmetric. — Márcio Augusto Diniz, Jul 28 '17 at 15:46

Haitao Du · Answer 2 · 2017-07-28T15:09:27.500

2

I am not sure, but I think we can do

$$ \text{minimize}~ \|\frac 1 {1+e^{-X\beta}} -y \|_2^2+ \lambda\|\beta\|_1 $$

Where $X$ is the data matrix and $y$ is the response and $\beta$ is the coefficients. The objective is convex.

And

$$ 0< \frac 1 {1+e^{-X\beta}} < 1$$

edited Jul 28 '17 at 15:09

answered Jul 28 '17 at 15:03

Haitao Du

36,852
25
145
242

score 0 · Answer 3 · answered Jul 28 '17 at 15:42

Let's say that the true relationship between predictors and response is (mostly) linear. In this case, you could do a regression and then truncate the outputs (i.e. anything below 0 counts as 0, anything above 1 counts as 1). This would be better than applying the sigmoid function.

If you used a sigmoid function, you'd want to do so while training the model (not simply applying it to a linear regression output); this would be better if your problem is closer to classification (i.e., most of your outputs are near 0 or near 1). (The betareg package manual mentions this idea too).

Ultimately, you'd want to use a plot of the data or some knowledge about its structure to make a final decision (per @user20160's comment).

How to do LASSO regression with a dependent variable that is continuous between 0 and 1

3 Answers3

Linked