I created a dummy dataset and compared the performance of SKLearn LinearRegression and Keras. Why is Keras producing horrible results compared to Linear Regression?
Code:
# Create Dataset
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=5000, n_features=10, noise=0.1)
Build Linear Regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
lr = LinearRegression()
lr.fit(X,y)
prediction_lr = lr.predict(X)
Build Keras Linear Regression
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(1, activation='relu', input_dim=10))
model.compile(optimizer='rmsprop', loss='mse')
model.fit(X,y, epochs=100, verbose=0)
prediction_nn = model.predict(X)
print(f'LR MSE: {mean_squared_error(prediction_lr, y)}')
print(f'NN MSE: {mean_squared_error(prediction_nn, y)}')
Output:
LR MSE: 0.010068399696132291
NN MSE: 26936.27829985695
Why is there a dramatic difference of MSE? How can we replicate Linear Regression using Keras?
Thanks
reluactivation function in your output layer. You probably want alinearactivation function that allows for values less than zero. // I disagree with the closure and think this is a specific problem that could generate an answer getting into what a neural network does to replicate linear regression (and I think this because I am writing such an answer in my head). //sklearnuses regularization by default, though regularization can be disabled (unlike in older versions). Disable the regularization to do ordinary least squares. – Dave Feb 07 '22 at 17:27