Is GRU the minimum/simplest RNN to prevent vanishing or exploding?

Question

I learned from this answer and this post that the forget gate in LSTM controls which information to vanish and which not, but I wonder if the LSTM or GRU is the minimum/simplest RNN to accomplish that(resistance to exploding and vanishing gradients)? If not which RNN structure is?

By simplicity, I mean fewer connections and fewer matrices, and the performance is irrelevant here. For instance, GRU has fewer connections and matrices than LSTM, so it is simpler. I mention LSTM and GRU because they are the most popular RNNs for that sake.

A single GRU is "simpler" in the sense that it does not have as many trainable weights and biases as a single LSTM unit. On the other hand, it's commonly found that one needs more GRU units to have comparable performance to a neural network using LSTM units on the same task with the same data. In what sense are you measuring "simplicity"? — Sycorax, Aug 24 '21 at 16:52
This paper compares a number of different recurrent network and finds no clear winner among them, when the networks are compared across diverse tasks. Jozefowicz et al., "An Empirical Exploration of Recurrent Network Architecture" — Sycorax, Aug 24 '21 at 21:01
And another LSTM architecture search paper. Greff et al., "LSTM: A Search Space Odyssey" where none of the variants significantly improve over the vanilla LSTM. The LSTM with coupled input/forget gates might be noteworthy, because the coupling reduces the parameter count. — Sycorax, Aug 24 '21 at 21:08
@Sycorax I mean fewer connections and fewer matrices, and the performance is irrelevant here. — Lerner Zhang, Aug 24 '21 at 22:37

Is GRU the minimum/simplest RNN to prevent vanishing or exploding?

0 Answers0