Neural Network is only predicting the mean value PyTorch Regression

Question

I'm trying to learn how to use PyTorch and to do so I pulled some Forex + COVID data I've used with other models in the past to predict the next-day exchange rate. The data has some COVID infection rates in both countries as well as some lag features. I first tried a single-layer network and that gave something (basically just predicted the previous day exchange rate, which is fine). However, the moment I tried to scale this network to be a bit more complicated (literally 2 layers) the network decides to be lazy and only guesses the mean of the training data. I did some research to try and fix this and tried many solutions including:

Scaling the data (did this before the first model)
Fiddling with learning rates and weight decay
Trying different optimizers (SGD, Adam, etc.)
Playing with different number of epochs. With the more complex network, the predictions on the training set only start predicting the mean after ~50 epochs. Before that they predict something that has the shape of the training labels, but with much lower actual values.

But I'm hopelessly stuck. I don't think its a problem with the data/data loading process since other models (ex. LightGBM and the single layer network) didn't just guess the mean. I can really only deduce that I've made a mistake with either my code and my code is buggy, or I'm missing something about Neural Networks that I should learn. Also I understand that other network architectures exist (RNN, LSTM, etc.) that might be more suited to the problem at hand, but I'm a beginner at deep learning and using this exercise as a learning experience.

Code:

class RateData(Dataset):
    def __init__(self, data: pd.DataFrame):
        self.X = torch.from_numpy(data.drop('rate', inplace= False, axis = 1).to_numpy().astype(np.float32))
        self.Y = torch.from_numpy(data[['rate']].to_numpy().astype(np.float32))
        self.n_samples = self.X.shape[0]
        self.n_features = self.X.shape[1]
    def __getitem__(self, index):
        sample = self.X[index], self.Y[index]
        return sample
    def __len__(self):
        return self.n_samples
class SimpleNet(nn.Module):
    def init(self, input_size):
        super(SimpleNet, self).init()
        self.linear1 = nn.Linear(input_size, 10)
        self.relu1 = nn.LeakyReLU()
        self.linear2 = nn.Linear(10, 1)
        self.relu2 = nn.LeakyReLU()
def forward(self, x):
    out = self.linear1(x)
    out = self.relu1(out)
    out = self.linear2(out)
    out = self.relu2(out)
    return out


train = RateData(pd.read_csv("resources/processed_data/train.csv"))
train_loader = DataLoader(dataset=train, batch_size= 10, shuffle = False)
val = RateData(pd.read_csv("resources/processed_data/val.csv"))
val_loader = DataLoader(dataset=val, batch_size= 10, shuffle = False)
model = SimpleNet(train.n_features)
loss = nn.MSELoss()
optimizer = SGD(model.parameters(), lr=.0001, weight_decay=.4)
n = 50
train_loss = [0] * n
val_loss = [0] * n
for epoch in trange(n):
    e_train_loss = []
    e_val_loss = []
    for i, (X, Y) in enumerate(train_loader):
        y_pred = model(X)
        l = loss(y_pred, Y)
        l.backward()
        optimizer.step()
        optimizer.zero_grad()
        e_train_loss.append(l.item())
    with torch.no_grad():
        for i, (X_v, Y_v) in enumerate(val_loader):
            y_pred_v = model(X_v)
            l_v = loss(y_pred_v, Y_v)
            e_val_loss.append(l_v.item())
        train_loss[epoch] = mean(e_train_loss)
        val_loss[epoch] = mean(e_val_loss)

Predicting the values for the testing set:

data = pd.read_csv("resources/processed_data/train.csv")
n = data.shape[0]
m = data.shape[1] - 1
X_t = torch.from_numpy(data.drop('rate', inplace= False, axis = 1).to_numpy().astype(np.float32))
Y_t = torch.from_numpy(data[['rate']].to_numpy().astype(np.float32))
with torch.no_grad():
    y_pred = model(X_t)
plt.plot(range(n), Y_t.detach().numpy(), label = "Truth")
plt.plot(range(n), y_pred.detach().numpy(), label = "Prediction")
plt.legend()
plt.show()

usually the "predict just the mean" is a strong indication that your data is uncorrelated in any way with your explanatory variables, please consider using a test/validation set and check if the first model that "seemed to be working" is not just overfitting — Alberto, Aug 15 '22 at 18:24
I used a test and validation set. The model guesses the same value no matter the data set put in (train, test, validation). Additionally, the other non-neural perform relatively well on the test sets. The data also definitely has some correlation as this is time-series data and I included lags as features (and the price today is usually close to the price yesterday). — Nakul Upadhya, Aug 15 '22 at 20:03
Does this answer your question? https://stats.stackexchange.com/a/519735/296197 — mhdadk, Aug 17 '22 at 21:02
Is there any reason you can't center your training data to have a mean of zero? This can help, because neural nets are typically initialized to have "mean zero output" (in quotes because of course this depends on the input data). — John Madden, Aug 17 '22 at 21:10
@Mahmoud I saw that post previously and felt that it didn't apply here. The problem I'm having is that adding complexity (like adding more layers) is what's causing the network to start having its funky behavior. Like I mentioned in my post, a less complex, single layer (basically just a perceptron) gave some kind of varied prediction, but adding complexity caused the network to "fail." I think this is a bug with my code, but I'm not familiar enough with PyTorch to identify the issue. I would be fine if the network was overpredicting, but its only predicting the mean even on the training set. — Nakul Upadhya, Aug 17 '22 at 21:48
You need to debug it. You will find good hints in the linked threads. I'm afraid we cannot help because this would need someone to have access to your data, code, and configs, and for them to spend a significant amount of time on debugging it, that's not really the purpose of Q&A sites like this. — Tim, Aug 17 '22 at 22:12

Neural Network is only predicting the mean value PyTorch Regression

0 Answers0