It is well known that an L1 regularized linear regression is equivalent to a regression with a Laplace prior on the distribution of the coefficients. This is explained here:
https://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/
I would love to make use of this face and convert a bayesian model I have with a Laplace prior to a simple Lasso regression using sklearn, which is much faster. However, when I try to follow to formula for the conversion of the b for the Laplace prior and the alpha for L1 - I do not get the expected results. According to the article above, the conversion from the b scale parameter of Laplace to the alpha of Lasso should be 2*sig^2 /b.
Using pymc3 to implement the bayesian model with a Laplace prior:
import numpy as np
import pymc3 as pm
from sklearn.linear_model import LinearRegression, Lasso
k = 2
n = 150
true_sigma = 1
true_alpha = 5
coefs = np.random.rand(k)3
X = np.random.rand(n, k)
y = (X coefs + true_alpha + np.random.rand(n, 1) * true_sigma).sum(axis=1)
basic_model = pm.Model()
b = 3
with basic_model:
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta = pm.Laplace("beta", mu=0, b=b, shape=k)
sigma = pm.HalfNormal("sigma", sigma=3)
mu = alpha + (beta * X).sum(axis=1)
Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=y)
map_estimate = pm.find_MAP(model=basic_model)
now using Lasso
reg_alpha = 2true_sigma*2 / b
lr = Lasso(alpha=reg_alpha)
lr.fit(X, y)
We can now compare lr.coef_ and map_estimate["beta"].
I played with the data drawn, sigma and b and found that the results are rarely the same. If I manually change the reg_alpha, I can find a value that will produce similar coefs, but I cannot find a consistent formula.
Even if this was solved - the theoretical formula involves knowing the true sigma (noise), which obviously we cannot do. Is there no way to convert the bayesian model given some b to an equivalent Lasso with the correct alpha?
Edit:
We found a solution. first, there were a few inaccuracies in the pymc3 model, making it slightly in-equivalent to Lasso. They don't make much of a different, but the correct model would be:
with basic_model:
alpha = pm.Flat("alpha")
beta = pm.Laplace("beta", mu=0, b=b, shape=k)
sigma = pm.HalfFlat("sigma")
mu = (beta * X).sum(axis=1) + alpha
Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=y)
map_estimate = pm.find_MAP(model=basic_model)
We can then estimate the MAP using Lasso:
reg_alpha = map_estimate['sigma']**2 / (b * n)
lr = Lasso(alpha=reg_alpha, tol = 1e-6, fit_intercept=True)
lr.fit(X, y)
but it requires as estimate for the true error of the model using map_estimate['sigma'].
However, we can iteratively re-estimate the error using this code:
tolerance = 1e-6
max_iter = 10
est_sigma = 1
for i in range(max_iter):
reg_alpha = est_sigma*2 / (b n)
lr = Lasso(alpha=reg_alpha, fit_intercept=True)
lr.fit(X, y)
new_est_sigma = (lr.predict(X) - y).std()
if abs(new_est_sigma - est_sigma) < tolerance:
break
est_sigma = new_est_sigma
print(lr.coef_, est_sigma)
which runs much faster than pymc's code. So given a bayesian regression with a Laplace prior with scale b it is possible to use Lasso and get similar results, at least 100x faster.