I implemented a simple RNN from scratch (using only the numpy library )for predicting the next characters, and I trained it on a simple text=“hello world”. It works fine, but I want to train it on a very large text. So I don’t know how I should train it. I understand I cannot train it on the large text at once because of the vanishing/exploding problem. So, I should train it in small batches, but I still don’t understand how that will work. How will the network learn from all the batches?”
Please note that I have implemented the entire RNN from scratch, including backpropagation through time, so I am familiar with the fundamentals.