1

I'm trying to implement the LSTM model for text classification where each sentence is about 1500 words. converted sentence to a sequence of values and fed to LSTM but gradients are becoming zero. I'm unable to fix it. why LSTM is facing a vanishing gradients problem.

input is {text,label}\n where text1= "the length of sentences is 1200 words" and label is 0 \n text2= "the length of sentences is 1500 words" and label is 1\n

Model is

class classifier(nn.Module):

#define all the layers used in model def init(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers,bidirectional, dropout):

#Constructor
super().__init__()          

self.embedding = nn.Embedding(vocab_size, embedding_dim)

self.lstm = nn.LSTM(embedding_dim, 
                   hidden_dim, 
                   num_layers=n_layers, 
                   bidirectional=bidirectional, 
                   dropout=dropout,
                   batch_first=True)

self.act1=nn.ReLU()
self.fc = nn.Linear(hidden_dim*2 , output_dim)
self.act2 = nn.LogSoftmax(1)

def forward(self, text, text_lengths): embedded = self.embedding(text) #embedded = [batch size, sent_len, emb dim] packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu(),batch_first=True)

packed_output, (hidden, cell) = self.lstm(packed_embedded)
print("packed",packed_output)

hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)

#hidden = [batch size, hid dim * num directions]
outputs1=self.act1(hidden)
dense_outputs1=self.fc(outputs1)

outputs=self.act2(dense_outputs1)

return outputs

output_dim considered is 1 to do binary classification

I tried reducing the length of sentences to 200 words but still vanishing gradients problem is arising.

Any reference is appreciable

  • This code doesn't actually implement a model because we don't know which attributes of self intake what data and in what order. – Sycorax Dec 19 '20 at 16:32
  • I think at least now the question contains a complete model. Please try to remove closed since I'm searching from the last 3 days, any kind of reference might be helpful – SS Varshini Dec 19 '20 at 17:11
  • 2
    You appear to be asking us to debug your code -- without even offering information about the input! If you could include a minimal reproducible example this question might work on [SO]. – whuber Dec 19 '20 at 17:48

0 Answers0