6

I've seen something like this a lot in literature : "we used x lstms cells in our implementation". I don't understand the point of using several stacked lstms : indeed, why isn't a single cell enough as it already takes the cell state and the hidden state from the previous time step ?

For example page 4 of this paper : https://arxiv.org/pdf/1612.04928.pdf

I see the advantage of parallelizing two cells but not the one of stacking.

Tiffany
  • 63
  • Your question is automatically flagged as low-quality because it is so short. Can you extend your question please? – Ferdi Sep 07 '17 at 12:07
  • Thank you for extending your question. Now it looks much better. If you still remember the paper where you read this sentence it would be awesome if you provide a link. – Ferdi Sep 07 '17 at 12:20
  • 1
    yes no problem, I edited again. – Tiffany Sep 07 '17 at 12:31

1 Answers1

1

One layer only has one cell. For more information read this. And the stacked multi-layer LSTM model is for extracting more abstract information. I think this question and this answer have explained this issue in detail.

Lerner Zhang
  • 6,636
  • 1
  • 41
  • 75