Questions tagged [lstm]

A Long Short Term Memory (LSTM) is a neural network architecture that contains recurrent NN blocks that can remember a value for an arbitrary length of time.

An LSTM has the following core components (not present in RNNs):

  1. Forget gate-which allows the LSTM to forget its past state or remember some elements of it
  2. Input gate- this gate decides what part of the new input arriving at the current step should be allowed to influence the cell's state
  3. Output gate-this gate determines what part of the cell's output should be allowed to flow out-typically to be consumed as a prediction A "cell" is a word used interchangeably for an individual LSTM.
810 questions
4
votes
2 answers

LSTM : shape of tensors?

I'm trying to understand LSTM, using for instance http://colah.github.io/posts/2015-08-Understanding-LSTMs/ I get the overall idea, I guess. But I'm not quite sure I get the maths. I'll set a very simple problem : I have a sequence of numbers and…
user3617487
4
votes
1 answer

Why would an LSTM converge to a fixed state when generating sequences?

I want to generate some sequences using LSTMs, like in the char-rnn from karpathy, I do nearly everything identical. For that during training the network greatly decreases the error over time, until it converges in some local minima of the error…
user143877
2
votes
0 answers

Do we get the best performance with "batch_size = 1"(especially for LSTM)?

In my experience, choosing batch_size = 1 gives the best result and choosing the batch_size = whole data number gives the worst. And seems there is a linear or exponential relation between these two numbers to choose(I mean choosing a number nearer…
2
votes
1 answer

Does each gate have one, or two matrices in LSTM?

According to these equations in wikimedia: each gate has two weight matrices:W,U respectively,but according this: from:http://colah.github.io/posts/2015-08-Understanding-LSTMs/ each gate has only one weight matrix:W,and in the example code: z =…
Alex Luya
  • 123
  • 4
2
votes
2 answers

LSTM with multidimensional input

It's hard to find literature where LTSM are used with multidimensional input. I know that LTSM admits various time series as input (multidimensional input) with the shape : (samples,look back,dimension).Dimension could be, electricity demand,…
J.Cirera
  • 313
1
vote
1 answer

What could cause my LSTM loss to decrease then increase?

I'm training 5 stacked Bi-LSTMs on an NLP task. The network fits well with a time series of length 30, and converges to around 0.97 AUROC. However, when I increase the length of the time series to 50, this happens: I'm not using masking (slows…
1
vote
0 answers

LSTM Vanishing Gradients

I'm trying to implement the LSTM model for text classification where each sentence is about 1500 words. converted sentence to a sequence of values and fed to LSTM but gradients are becoming zero. I'm unable to fix it. why LSTM is facing a vanishing…
1
vote
0 answers

LSTM vector sum of inputs as memory

I want to set up a long short-term memory (LSTM) network to have the vector sum of the inputs, which live in R^d, as its memory (ct). what is the required choices of the weight, and activation functions to do this?
harry
  • 11
1
vote
1 answer

User behavior prediction using LSTMs

Let a user U with 3 possible states: A, B and C. From A you can go everywhere (including A), from B or C you can only go to A. LSTM are a good to model Markov-problems with an extra notion of long term memory across steps. But if I understand right,…
0xmax
  • 151
1
vote
1 answer

Readout in RNNs (LSTM)

How is the readout functions in LSTMs? The output of the last layer and timestep t is transmitted to the first layer and timestep t+1 or also the cell state of the last layer and timestep t? Any sources for the readout in LSTMs?
Xxxo
  • 212
1
vote
1 answer

LSTM Classifying all the words as the same class

I've used Lasagne to build a LSTM model to classify words with the IOB-tags. About 25-40% of the training words classes is O, thus receiving the same int32 class number 126. The words go through a context window method, in order to increase the…
0
votes
0 answers

Does an LSTM make sense in this context?

I have a continious variable I wish to predict, for which I have sensory data on a fixed time interval t, denoted as y_t. In addition, I have a set of features which are consistent across time, called f_t, but also time itself is a feature, which…
Jigeli
  • 11
0
votes
0 answers

What to do if my LSTM model doesn't learn

I have taken text input then converted to a sequence of values and fed it to LSTM model where my loss is not reducing and accuracy is abnormal. The above image is about training and validation accuracy. Here I have taken 10 epochs but even for 100…