How does the model learn the weights and biases in a Transformer NN architecture?

Asked Jan 09 '24 at 18:00

Active Jan 09 '24 at 18:00

Viewed 18 times

In the transformer block between Input Embedding Matrix and Layer normalization, where the data is scaled to mean = 0, and std = 1. How does the NN learn weights and biases and applying them to the data?

asked Jan 09 '24 at 18:00

kms

[tag:backpropagation] – Sycorax Jan 09 '24 at 18:26

How does the model learn the weights and biases in a Transformer NN architecture?

0 Answers0