I've used Lasagne to build a LSTM model to classify words with the IOB-tags. About 25-40% of the training words classes is O, thus receiving the same int32 class number 126.
The words go through a context window method, in order to increase the number of features, and be influenced by the neighboring words.
After that, the words(with their context window) go through a word embedding process, before being fed to the model.
At the first training words, my model classify the words with different classes, then it starts classifying a lot of words with the same class:
[ 54 9 119 41 77 1 1 96 96 84 84 96 96 96 96 45 74 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34]
[ 54 85 7 119 22 7 115 84 62 62 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71]
[ 85 1 83 113 13 36 82 58 126 2 2 17 19 117 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25]
After some training, it starts classifying every word as 126, the index of the O class.
It looks as a hyper-paramether configuration problem, but I don't have a clue of how to fix it. Can someone give me a hint? Thank you.
I'm already using a reshape layer between the hidden LSTM layer and the output layer with the softmax activation function.
Here is the main code. You only need Lasagne installed and the ATIS data to run it.
Thank you.
– Lucas Azevedo Nov 19 '15 at 16:00