I'm trying to implement the LSTM model for text classification where each sentence is about 1500 words. converted sentence to a sequence of values and fed to LSTM but gradients are becoming zero. I'm unable to fix it. why LSTM is facing a vanishing gradients problem.
input is {text,label}\n where text1= "the length of sentences is 1200 words" and label is 0 \n text2= "the length of sentences is 1500 words" and label is 1\n
Model is
class classifier(nn.Module):
#define all the layers used in model
def init(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers,bidirectional, dropout):
#Constructor
super().__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim,
hidden_dim,
num_layers=n_layers,
bidirectional=bidirectional,
dropout=dropout,
batch_first=True)
self.act1=nn.ReLU()
self.fc = nn.Linear(hidden_dim*2 , output_dim)
self.act2 = nn.LogSoftmax(1)
def forward(self, text, text_lengths):
embedded = self.embedding(text)
#embedded = [batch size, sent_len, emb dim]
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu(),batch_first=True)
packed_output, (hidden, cell) = self.lstm(packed_embedded)
print("packed",packed_output)
hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)
#hidden = [batch size, hid dim * num directions]
outputs1=self.act1(hidden)
dense_outputs1=self.fc(outputs1)
outputs=self.act2(dense_outputs1)
return outputs
output_dim considered is 1 to do binary classification
I tried reducing the length of sentences to 200 words but still vanishing gradients problem is arising.
Any reference is appreciable
selfintake what data and in what order. – Sycorax Dec 19 '20 at 16:32