I'm trying to learn how to use PyTorch and to do so I pulled some Forex + COVID data I've used with other models in the past to predict the next-day exchange rate. The data has some COVID infection rates in both countries as well as some lag features. I first tried a single-layer network and that gave something (basically just predicted the previous day exchange rate, which is fine). However, the moment I tried to scale this network to be a bit more complicated (literally 2 layers) the network decides to be lazy and only guesses the mean of the training data. I did some research to try and fix this and tried many solutions including:
- Scaling the data (did this before the first model)
- Fiddling with learning rates and weight decay
- Trying different optimizers (SGD, Adam, etc.)
- Playing with different number of epochs. With the more complex network, the predictions on the training set only start predicting the mean after ~50 epochs. Before that they predict something that has the shape of the training labels, but with much lower actual values.
But I'm hopelessly stuck. I don't think its a problem with the data/data loading process since other models (ex. LightGBM and the single layer network) didn't just guess the mean. I can really only deduce that I've made a mistake with either my code and my code is buggy, or I'm missing something about Neural Networks that I should learn. Also I understand that other network architectures exist (RNN, LSTM, etc.) that might be more suited to the problem at hand, but I'm a beginner at deep learning and using this exercise as a learning experience.
Code:
class RateData(Dataset):
def __init__(self, data: pd.DataFrame):
self.X = torch.from_numpy(data.drop('rate', inplace= False, axis = 1).to_numpy().astype(np.float32))
self.Y = torch.from_numpy(data[['rate']].to_numpy().astype(np.float32))
self.n_samples = self.X.shape[0]
self.n_features = self.X.shape[1]
def __getitem__(self, index):
sample = self.X[index], self.Y[index]
return sample
def __len__(self):
return self.n_samples
class SimpleNet(nn.Module):
def init(self, input_size):
super(SimpleNet, self).init()
self.linear1 = nn.Linear(input_size, 10)
self.relu1 = nn.LeakyReLU()
self.linear2 = nn.Linear(10, 1)
self.relu2 = nn.LeakyReLU()
def forward(self, x):
out = self.linear1(x)
out = self.relu1(out)
out = self.linear2(out)
out = self.relu2(out)
return out
train = RateData(pd.read_csv("resources/processed_data/train.csv"))
train_loader = DataLoader(dataset=train, batch_size= 10, shuffle = False)
val = RateData(pd.read_csv("resources/processed_data/val.csv"))
val_loader = DataLoader(dataset=val, batch_size= 10, shuffle = False)
model = SimpleNet(train.n_features)
loss = nn.MSELoss()
optimizer = SGD(model.parameters(), lr=.0001, weight_decay=.4)
n = 50
train_loss = [0] * n
val_loss = [0] * n
for epoch in trange(n):
e_train_loss = []
e_val_loss = []
for i, (X, Y) in enumerate(train_loader):
y_pred = model(X)
l = loss(y_pred, Y)
l.backward()
optimizer.step()
optimizer.zero_grad()
e_train_loss.append(l.item())
with torch.no_grad():
for i, (X_v, Y_v) in enumerate(val_loader):
y_pred_v = model(X_v)
l_v = loss(y_pred_v, Y_v)
e_val_loss.append(l_v.item())
train_loss[epoch] = mean(e_train_loss)
val_loss[epoch] = mean(e_val_loss)
Predicting the values for the testing set:
data = pd.read_csv("resources/processed_data/train.csv")
n = data.shape[0]
m = data.shape[1] - 1
X_t = torch.from_numpy(data.drop('rate', inplace= False, axis = 1).to_numpy().astype(np.float32))
Y_t = torch.from_numpy(data[['rate']].to_numpy().astype(np.float32))
with torch.no_grad():
y_pred = model(X_t)
plt.plot(range(n), Y_t.detach().numpy(), label = "Truth")
plt.plot(range(n), y_pred.detach().numpy(), label = "Prediction")
plt.legend()
plt.show()