I am new to ML and trying to write a very simple model that can predict the future of a function given the past. That is to say, I define a simple cubic function (y = x^3 + x^2 + x) and apply it to a domain of 100 points from x = -10 to x = 10. I would like to use the first 80 points -- params = y[:80] -- to train some weights using SGD -- weights = np.random.normal((20,80)). The reason I decided to choose this shape for my weights is that weights @ params will return an array with only 20 items, which are compared with trgts = y[80:] using the torch.nn.MSELoss function.
When I run my code, which I will post in full below, my loss function diverges rather than converging. I would like to know why this is.
import numpy as np
import torch
x = np.linspace(-10,10,100)
y = x3 + x2 + x
loss_fn = torch.nn.MSELoss(reduction = 'sum')
def get_trgts(i): return torch.tensor(y[i:])
def get_weights(i,o): return torch.tensor(np.random.normal(0,3,(o,i))).requires_grad_()
def get_params(i): return torch.tensor(y[:i])
weights = get_weights(80,20)
trgts = get_trgts(80)
params = get_params(80)
bias = get_weights(1,1)
lr = 1e-6
def apply_step(weights, bias, prn = True):
preds = weights@params + bias
loss = loss_fn(preds,trgts)
loss.backward()
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
if prn: print(loss.item())
return preds
for i in range(10): apply_step(weights, bias)
The printed output in one case is:
1305310646.9058952
184198069792.42996
25992991779912.277
3667984265155172.0
5.176052331083886e+17
7.304152852734643e+19
1.0307208174021644e+22
1.4544950315880012e+24
2.0525012798773587e+26
2.8963739390011883e+28
Which is evidently diverging.
P.S. I understand that in its current form, the model is useless, I'm simply using it to learn the principles.