Can a garden-variety neural network square a number in the general case?

Question

I have a NN that I would like to square a number. This is a learning exercise for me.

My input is the number to be squared, the output is the square.

Two questions: 1) How can this possibly work? The weights and nodes of the NN need to square to a number that isn't fixed.

2) Assuming I am wrong, what is a strategy for choosing the numbers of nodes and layers for a NN?

As an example: https://stats.stackexchange.com/questions/299915/how-does-the-rectified-linear-unit-relu-activation-function-produce-non-linear/299933#299933 but a necessary unstated component to your question is what amount of precision you want in the result & in what interval; the universal approximation* theorem* lays out technical criteria for NNs to approximate specific functions. — Sycorax, Jun 24 '19 at 18:29

Dave · Answer 1 · 2022-03-15T17:10:08.343

The ReLU activation function should take care of this.

ReLU works by fitting short, straight lines to approximate curves. That should be able to create a parabola. You will have performance suffer for inputs with very large absolute values, but we know that models won't be perfect.

I was thinking that one hidden layer could take care of this, but reading about the universal approximation theorem (which I suggest doing), we can be more efficient by having fewer nodes in multiple hidden layers than tons of nodes in one hidden layer.

EDIT

I didn't make this clear three years ago. The universal approximation theorem says that we can approximate on a compact set (on the real line, that means a closed and bounded subset of the number line). Once you go past that bound, all bets are off, which is why I say that you will have performance suffer for inputs with very large absolute values. For a visualization, imagine how an absolute value function ($\vert x\vert = ReLU(x) + ReLU(-x)$) could approximate $y=x^2$ for small numbers, such as $(-1, 1)$, but the approximation is awful for $x=10$, for instance.

8e9yQBKVlIDwoIVegfkJ · Answer 2 · 2023-04-30T19:40:04.583

This is an interesting question. I wanted to contribute an answer which shows how we can do this practically in Python, and call out a few interesting things. I hope the interested reader will take the code, modify it and experiment themselves. I give a few suggestions for things to play around with at the end.

Python Implementation - using Pytorch

The code below creates a neural network using Pytorch. I have used the ReLU function between layers (see comment below). I have tried to find a balance between a network which is simple and easy to train, but which also does a reasonable job (at least on the interval [0,10], see comments and graph below).

The model is trained on random data from the range [0,10].

Graphs

This graph shows the predicted (blue) and actual (red) values, for unseen random input data from the range [0,10].

It is interesting to note how poorly the model performs outside the region on which it is trained.

Things to experiment with

Try other activation functions or combinations (like tanh). If I keep everything in the code below identical but change the activation functions to tanh we get.

We can improve the performance with more epochs...

I also note here that the function $x^2$ is non-linear, so you could use that as your activation function - but I do not think that is in the spirit of this question :D

See what happens if you use less training data or over a bigger range.
See what happens if you change the architecture, for example using fewer layers.

Code

import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
# Create training data
X = torch.distributions.uniform.Uniform(0,10).sample([1000,1])
y = X**2
model = nn.Sequential(
    nn.Linear(1, 16),
    nn.ReLU(),
    nn.Linear(16, 16),
    nn.ReLU(),
    nn.Linear(16, 1),
)
loss_fn   = nn.MSELoss() 
optimizer = optim.Adam(model.parameters(), lr=0.001)
n_epochs = 150
batch_size = 50
for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')
#Example, can we square 3 - looks ok
print(model(torch.tensor([3], dtype=torch.float)))
For all intents and purposes we can assume the data below is all unseen - potentially could be some overlap by random chance with training X
unseenX = torch.distributions.uniform.Uniform(-5,15).sample([1000,1])
predictions_on_unseenX = model(unseenX)
Plotting
fig, ax = plt.subplots()
plt.scatter(unseenX, unseenX**2, c="red", label="Actual values", s=1)
plt.scatter(unseenX, predictions_on_unseenX.detach(), c="blue", s=1, label="Predictions")
plt.text(0, 100, "Training data was in this range")
plt.title("Using ReLU ")
plt.legend()
ax.axvspan(0, 10, alpha=0.5, color='grey')

Can a garden-variety neural network square a number in the general case?

2 Answers2

Python Implementation - using Pytorch

Graphs

Things to experiment with

Code

For all intents and purposes we can assume the data below is all unseen - potentially could be some overlap by random chance with training X

Plotting

Further Reading