I am trying to train a small MLP in Pytorch. Here is the code for the net:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.ln_1 = nn.Linear(600, 512)
self.ln_2 = nn.Linear(512, 256)
self.ln_3 = nn.Linear(256, 128)
self.ln_4 = nn.Linear(128, 64)
self.ln_5 = nn.Linear(64, 64)
self.ln_6 = nn.Linear(64, 32)
self.ln_7 = nn.Linear(32, 16)
self.ln_8 = nn.Linear(16, 1)
self.sig = nn.Sigmoid()
self.relu = nn.ReLU()
def forward(self, x):
print(f"Input: {x}")
x = F.sigmoid(self.ln_1(x))
print(f"After Layer 1:{x}")
x = F.sigmoid(self.ln_2(x))
print(f"After Layer 2:{x}")
x = F.sigmoid(self.ln_3(x))
x = F.sigmoid(self.ln_4(x))
x = F.sigmoid(self.ln_5(x))
x = F.sigmoid(self.ln_6(x))
x = F.sigmoid(self.ln_7(x))
print(f"After Layer 7:{x}")
x = F.sigmoid(self.ln_8(x))
print(f"After Layer 8 (out):{x}")
output = x
return output
The input dim = 12x600 looks like this:
tensor([[[-0.0013, -0.0038, -0.0044, ..., 0.0002, 0.0128, 0.0198],
[-0.0043, -0.0026, -0.0003, ..., -0.0002, 0.0038, 0.0057],
[ 0.0364, 0.0272, 0.0145, ..., 0.0054, 0.0084, 0.0114],
...,
[-0.0179, -0.0068, 0.0040, ..., -0.0222, -0.0262, -0.0192],
[-0.0059, -0.0049, -0.0024, ..., -0.0403, -0.0379, -0.0358],
[ 0.0007, 0.0017, 0.0024, ..., -0.0024, -0.0036, -0.0040]]])
And the output 1x12:
tensor([[[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051],
[0.0051]]], grad_fn=<SigmoidBackward0>)
The optimizer looks like this and I tried different lr values to no avail.
criterion = nn.MSELoss()
optimizer = optim.Adam(Net.parameters(), lr=0.0001)
Training loop is designed the following way:
for epoch in range(epochs):
running_loss = 0.0
inp = inputs
feat = features
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = fin(inp)
loss = criterion(outputs.float(), feat)
loss.backward()
optimizer.step()
I want to know why despite different values being input to the model I end up with the same output for all 12 inputs? Thank you.
Below is the example of layer outputs at different points:
Input: tensor([[-0.0036, -0.0060, -0.0065, ..., 0.0006, -0.0021, -0.0043],
[ 0.0061, 0.0047, 0.0054, ..., 0.0002, -0.0016, -0.0028],
[ 0.0012, 0.0028, 0.0037, ..., -0.0062, -0.0059, -0.0067],
...,
[-0.0002, 0.0010, 0.0005, ..., -0.0011, -0.0011, 0.0002],
[-0.0003, -0.0012, -0.0010, ..., -0.0022, -0.0002, 0.0020],
[ 0.0005, -0.0013, -0.0027, ..., 0.0037, 0.0047, 0.0045]])
After Layer 1:tensor([[0.4986, 0.4950, 0.5072, ..., 0.4992, 0.5060, 0.4963],
[0.4986, 0.4958, 0.5067, ..., 0.4999, 0.5050, 0.4980],
[0.4995, 0.4963, 0.5060, ..., 0.4994, 0.5058, 0.4973],
...,
[0.4983, 0.4964, 0.5053, ..., 0.4992, 0.5055, 0.4976],
[0.4982, 0.4944, 0.5061, ..., 0.4989, 0.5048, 0.4972],
[0.4983, 0.4949, 0.5056, ..., 0.4997, 0.5058, 0.4970]],
grad_fn=<SigmoidBackward0>)
After Layer 2:tensor([[0.4716, 0.3847, 0.3724, ..., 0.5870, 0.5455, 0.4221],
[0.4716, 0.3848, 0.3725, ..., 0.5867, 0.5460, 0.4221],
[0.4715, 0.3848, 0.3724, ..., 0.5868, 0.5460, 0.4222],
...,
[0.4716, 0.3849, 0.3724, ..., 0.5871, 0.5461, 0.4220],
[0.4716, 0.3848, 0.3724, ..., 0.5869, 0.5461, 0.4221],
[0.4717, 0.3848, 0.3725, ..., 0.5869, 0.5462, 0.4222]],
grad_fn=<SigmoidBackward0>)
After Layer 7:tensor([[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706],
[0.4533, 0.4607, 0.5720, 0.4933, 0.4806, 0.5116, 0.6091, 0.4371, 0.5633,
0.5099, 0.6671, 0.6205, 0.4186, 0.4554, 0.6118, 0.4706]],
grad_fn=<SigmoidBackward0>)
After Layer 8 (out):tensor([[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051],
[0.5051]], grad_fn=<SigmoidBackward0>)
```