I have only the basic understanding of Neural Networks (NN). Recently, I encountered a scenario at my company where a team was using linear regression (LR) to forecast an important continuous parameter. It looked like a classic problem for NN so I tried to make a better forecast.
In a Colab notebook, Chat-GPT and myself created a simple NN. Initially it only marginally improved the Mean absolute error (MAE). However, after replacing "not a number" (NaN) values with zeros and standardizing the data, the performance significantly improved:
- LR: MAE = 4.7
- 1st NN: MAE = 4.3
- 2nd NN + replace NaNs with zeros + standardized data: MAE = 1.6
- Edit: LR + replace NaNs with zeros + standardized data: MAE = 2.8
Is the 2nd NN that largely lowers the MAE means it is a good forecast? What should I check or verify before I can say I found a better forecast?
code:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Masking
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Replace NaN values to 0
X_train = X_train.fillna(0)
X_test = X_test.fillna(0)
Standardize the data (optional but recommended for neural networks)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Define the model with a Masking layer
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(X_train_scaled.shape[1],)))
model.add(Dense(64, activation='relu'))
model.add(Dense(42, activation='relu'))
model.add(Dense(1, activation='linear')) # Linear activation for regression
model.summary()
Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error')
Fit the model to your data
model.fit(X_train_scaled, y_train, epochs=50, batch_size=42, validation_data=(X_test_scaled, y_test))
Evaluate the model
y_pred = model.predict(X_test_scaled)
Calculate evaluation metrics
mae_nn = metrics.mean_absolute_error(y_test, y_pred)
mse_nn = metrics.mean_squared_error(y_test, y_pred)
rmse_nn = np.sqrt(mse_nn)
r2_nn = metrics.r2_score(y_test, y_pred)
explained_variance_nn = metrics.explained_variance_score(y_test, y_pred)
Print the summary of evaluation metrics
print(f"MAE: {round(mae_nn, 3)}, RMSE: {round(rmse_nn, 3)}, R^2: {round(r2_nn, 3)}, Explained Variance Score: {round(explained_variance_nn, 3)}")
EDIT: Based on @Stephan Kolassa answer I checked a flat zero forecast. It does very badly:
MAE: 7.864, MSE: 221.427, RMSE: 14.88, R^2: -0.388, Explained Variance Score: 0.0
Zero Forecast Code:
import pandas as pd
from sklearn import metrics
Create a flat zero forecast
zero_forecast = pd.Series(0, index=y_test.index)
Calculate evaluation metrics for the zero forecast
mae_zero = metrics.mean_absolute_error(y_test, zero_forecast)
mse_zero = metrics.mean_squared_error(y_test, zero_forecast)
rmse_zero = np.sqrt(mse_zero)
r2_zero = metrics.r2_score(y_test, zero_forecast)
explained_variance_zero = metrics.explained_variance_score(y_test, zero_forecast)
Print the summary of evaluation metrics for the zero forecast
print(f"Zero Forecast: MAE: {round(mae_zero, 3)}, MSE: {round(mse_zero, 3)}, RMSE: {round(rmse_zero, 3)}, R^2: {round(r2_zero, 3)}, Explained Variance Score: {round(explained_variance_zero, 3)}")
Regarding what are these NaN values: The data is about users that play at a mobile game. Some users are new so they do not have a value in some of the fields, for example: #games played on the last session, #games in the session before that, and so on... so some users did not played yet X sessions. The team that predict using LR made several different models in order to avoid NaN values. (a model for users with X features, a model for Y features, ...)
– Cohensius Feb 04 '24 at 16:19