0

I learned from this answer and this post that the forget gate in LSTM controls which information to vanish and which not, but I wonder if the LSTM or GRU is the minimum/simplest RNN to accomplish that(resistance to exploding and vanishing gradients)? If not which RNN structure is?

By simplicity, I mean fewer connections and fewer matrices, and the performance is irrelevant here. For instance, GRU has fewer connections and matrices than LSTM, so it is simpler. I mention LSTM and GRU because they are the most popular RNNs for that sake.

Lerner Zhang
  • 6,636
  • 1
  • 41
  • 75
  • 1
    A single GRU is "simpler" in the sense that it does not have as many trainable weights and biases as a single LSTM unit. On the other hand, it's commonly found that one needs more GRU units to have comparable performance to a neural network using LSTM units on the same task with the same data. In what sense are you measuring "simplicity"? – Sycorax Aug 24 '21 at 16:52
  • 1
    This paper compares a number of different recurrent network and finds no clear winner among them, when the networks are compared across diverse tasks. Jozefowicz et al., "An Empirical Exploration of Recurrent Network Architecture" – Sycorax Aug 24 '21 at 21:01
  • 1
    And another LSTM architecture search paper. Greff et al., "LSTM: A Search Space Odyssey" where none of the variants significantly improve over the vanilla LSTM. The LSTM with coupled input/forget gates might be noteworthy, because the coupling reduces the parameter count. – Sycorax Aug 24 '21 at 21:08
  • @Sycorax I mean fewer connections and fewer matrices, and the performance is irrelevant here. – Lerner Zhang Aug 24 '21 at 22:37

0 Answers0