1

How is the readout functions in LSTMs?

The output of the last layer and timestep t is transmitted to the first layer and timestep t+1 or also the cell state of the last layer and timestep t?

Any sources for the readout in LSTMs?

Xxxo
  • 212

1 Answers1

1

It's up to the practitioner, but typically only the hidden states are used by the layer above.

One can use RNN to have one output, or a sequence of output, as this figure (source) illustrates:

enter image description here

Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this soon). From left to right: (1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). (2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words). (3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). (4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). (5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video). Notice that in every case are no pre-specified constraints on the lengths sequences because the recurrent transformation (green) is fixed and can be applied as many times as we like.

Also you can add some attention mechanism, in which case even many to one and many to many architectures will use all hidden states (unless it is hard attention), and not just the last one.

Franck Dernoncourt
  • 46,817
  • 33
  • 176
  • 288
  • I write at the question: "The output of the last layer and timestep t is transmitted to the first layer and timestep t+1 or also the cell state of the last layer and timestep t?".

    So, no "layer above".

    In general, your answer does not answer my question at all. Thnx anyway for your effort :)

    – Xxxo Nov 02 '16 at 16:48
  • @Xxxo You don't have any layer after the RNN? – Franck Dernoncourt Nov 02 '16 at 17:05
  • Hey, I have a question to your answer though. You wrote that: typically only the hidden states are used by the layer above. This means that to the next LSTM layer, the previous one transmits only the cell state?

    Or, you mean that "hidden_state = [cell_state; time_step_output]"?

    In the second case, this means that the above layer will get also the cell states of the previous one?

    – Xxxo Nov 02 '16 at 17:07
  • Please read again my post. I'm saying about the transmission of the output of the last layer at time step t to the first layer and time step t+1.

    This would answer if I have another layer (i.e. if I get all the output or just the last).

    – Xxxo Nov 02 '16 at 17:09
  • @Xxxo by first layer you mean the input layer? – Franck Dernoncourt Nov 02 '16 at 17:11
  • OK. We have 3 LSTM layers, stacked. How is the functionality of feeding the output of the last layer (i.e. the third layer) and at timestep t at the first layer and at timestep t+1? – Xxxo Nov 02 '16 at 17:12
  • @Xxxo I see. Typically there is one dense layer after the last LSTM layer, and that's the output of the dense layer that is given to the first LSTM layer. (e.g. https://pbs.twimg.com/media/Cd5sJkSUYAA71Zf.jpg) – Franck Dernoncourt Nov 02 '16 at 17:40