I've seen some articles about text generation using LSTMs (or GRUs) for text generation.
Basically it seems you train them by folding them out, and putting a letter in each input. But say you trained it with text which includes the string:
"The dog chased the cat"
and also includes the string:
"The lion chased the ca"
Both should be acceptable. But although they are very similar they differ entirely after the 4th character. So they will result in two very different vectors. Further the longer you roll it out the more they will differ. How is it then possible for an LSTM to learn that "The [something] chased the cat" is an acceptable phrase?
Equally if you train it to try and learn to pair up parentheses. I can see how you could manually design it to do this, but how exactly could it be trained to do this just by entering strings like "(das asdasdas) axd"?
What I'm getting at is that I don't get how it could LEARN any sort of structure more than a Markov model.
Any ideas?
(Also, I've only ever seen one article that showed a LSTM that can pair parentheses. So has this study ever been replicated?) I get that it is possible for an LSTM to do this but I don't get how it can learn to do this!
What I'm getting at is you usually train things with the input phrase and then compare it to the expected phrase to get the error. But in text generation there might be millions of possible acceptable expected phrases! So how can you compute an error?