0

I am new to ML and trying to write a very simple model that can predict the future of a function given the past. That is to say, I define a simple cubic function (y = x^3 + x^2 + x) and apply it to a domain of 100 points from x = -10 to x = 10. I would like to use the first 80 points -- params = y[:80] -- to train some weights using SGD -- weights = np.random.normal((20,80)). The reason I decided to choose this shape for my weights is that weights @ params will return an array with only 20 items, which are compared with trgts = y[80:] using the torch.nn.MSELoss function.

When I run my code, which I will post in full below, my loss function diverges rather than converging. I would like to know why this is.

import numpy as np
import torch

x = np.linspace(-10,10,100) y = x3 + x2 + x

loss_fn = torch.nn.MSELoss(reduction = 'sum')

def get_trgts(i): return torch.tensor(y[i:]) def get_weights(i,o): return torch.tensor(np.random.normal(0,3,(o,i))).requires_grad_() def get_params(i): return torch.tensor(y[:i])

weights = get_weights(80,20) trgts = get_trgts(80) params = get_params(80) bias = get_weights(1,1)

lr = 1e-6 def apply_step(weights, bias, prn = True): preds = weights@params + bias loss = loss_fn(preds,trgts) loss.backward() with torch.no_grad(): weights -= weights.grad * lr bias -= bias.grad * lr weights.grad.zero_() bias.grad.zero_() if prn: print(loss.item()) return preds

for i in range(10): apply_step(weights, bias)

The printed output in one case is:

1305310646.9058952
184198069792.42996
25992991779912.277
3667984265155172.0
5.176052331083886e+17
7.304152852734643e+19
1.0307208174021644e+22
1.4544950315880012e+24
2.0525012798773587e+26
2.8963739390011883e+28

Which is evidently diverging.

P.S. I understand that in its current form, the model is useless, I'm simply using it to learn the principles.

1 Answers1

0

I have discovered the error. Changing the learning rate to lr = 1e-7 will cause the loss to converge rather than diverge! Thanks to anyone who spent time considering my question.