I have two output tensors whose MSE is really small (0.04) but on checking the distribution of the tensors they are very different.
I am using:
torch.nn.MSE(output_tensor, input_tensor) = 0.04
I am just wondering how can this happen?
Edit 1: Here the tensors are the learned representations of words in sentences. So their shape is [batch_size x num_tokens x embed_dimension], for my case [batch_size x 16 x 512]. I wanted to check if two tensors are similar.


I am just wondering if I have wrongly used MSE loss or it is actual problem of the model. But as the MSE is very low, the model is not updated and produces same tensor everytime.
– Bishwa Karki Nov 08 '23 at 21:49