PyTorch Ignore padding for LSTM batch training

Question

I realize there is packed_padded_sequence and so on for batch training LSTMs, but that takes an entire sequence and embeds it then forwards it through the LSTM. My LSTM is built so that it just takes an input character then forward just outputs the categorical at each sequence. So I built it so that I pad the sequences before hand so they’re equal length then each index is fed in sequentially. Unfortunately this also means the first characters are pad characters (Because I use prepadding). Is there a way to get the LSTM not to backprop on inputs that are just pads using this LSTM setup?

Lerner Zhang · Answer 1 · 2020-02-15T09:16:03.163

We can see/learn from the implementation of the bidirectional dynamic RNN in TensorFlow that the backward LSTM was just the reversed input(or forward input), then we can reverse the sequence and do padding. Once we get the states we just reverse them back and do masking to mask out the gradients for the pads.

Once the mask values for the pads are zeros the gradients would be zeroed, and for the dynamic RNN the PADs will not affect the final state(c and h) because the recurrence just stops once we set the sequence_length in TensorFlow. And if you use Pytorch you just input the reversed and padded inputs into the API and anything goes the same as that for a normal sequence input.

It seems that PyTorch doesn't support dynamic RNN and it does not affect what you want to do because "prepading"(in your words) just becomes normal padding once you reverse your input.

I thought PyTorch doesn't support that yet, please refer to this: https://discuss.pytorch.org/t/about-the-variable-length-input-in-rnn-scenario/345/9 — Lerner Zhang, Feb 15 '20 at 09:07

PyTorch Ignore padding for LSTM batch training

1 Answers1