Can I use Log Loss to solve a non-linear Classification task?

Question

Assume the following data, where the yellow points represent class $1$ and the purple points represent class $2$. I would like to know, if it is possible to build a linear classifier (sigmoid) by transforming the data accordingly?

To get some intuition for the log loss in combination with the sigmoid function, I have tried to classify the data by adding only $1$ dimension for the bias.

Data :

func = lambda x: np.sin(x) - 0.5 * x ** 3
x1 = np.linspace(-1.5, 1.5, 100)
class1 = np.random.uniform(0.5, 1, size = 50) * 0.5
class2 = np.random.uniform(-1, -0.5, size = 50) * 0.5
class_concat = np.concatenate([class1, class2])
np.random.shuffle(x1)
np.random.shuffle(class_concat)
x2 = func(x1) + class_concat
x3 = np.ones(len(x2))
x = np.vstack([x1, x2, x3]).T
y = np.array([0 if val < 0 else 1 for val in x[:, 1]])

Model:

def sigmoid(z):
return 1 / (1 + np.exp(-z))


def log_loss(pred, label):
return - np.sum((label * np.log(pred)) + (1-label) * (np.log(1 - pred))) / 100



w = np.random.uniform(-0.1, 0.1, size=(3, 1))
num_epochs, lr = 500, 0.004
for _ in range(num_epochs):
z = x @ w
pred = sigmoid(z)
loss = log_loss(pred, y)
grad = -(1/100) * x.T @ (sigmoid(z) - y[:, None]) #
w = w - lr * grad

print(f'loss after {_+1} epochs: {loss}')

As expected, using $3$ weights, ($1$ bias), I do not get good seperation:

My idea was now, to transform the data, for example by expanding it to some polynomial, and thus using more weights:

$ X = w_0 + w_1x + w_2x^2 + w_3x^3 .... w_{n}x^n$. In theory, I intuitively think this could work, since I allow the model a lot of flexibility for each weight to minimize the loss, but it seems hard to optimize the loss function using gradient descent. I would appreciate any input that could guide me in the right direction!

Dave · Accepted Answer · 2023-01-06T21:13:54.100

5

YES

The log loss corresponds to using maximum likelihood estimation for a binomial likelihood (conditional distribution) and does not care at all about the function being used to predict the probability. Logistic regression often uses log loss. Neural networks often use log loss. A logistic regression with nonlinear terms like polynomials or splines could use log loss (I think you intend to do this). It’s even theoretically possible (even if challenging) to use log loss for a linear model of the probability (whereas logistic regression is a linear model of the log odds but a nonlinear model of the probability).

edited Jan 06 '23 at 21:13

answered Jan 06 '23 at 21:08

Dave

62,186

Interesting. But wouldn't it be problematic for example to use a constant learning rate for, for example, $30$ weights? I would like to try to decrease the loss on my own, rather than using libraries in the beginning. – kklaw Jan 06 '23 at 21:15

Can I use Log Loss to solve a non-linear Classification task?

1 Answers1