Questions tagged [lstm]

A Long Short Term Memory (LSTM) is a neural network architecture that contains recurrent NN blocks that can remember a value for an arbitrary length of time.

An LSTM has the following core components (not present in RNNs):

Forget gate-which allows the LSTM to forget its past state or remember some elements of it
Input gate- this gate decides what part of the new input arriving at the current step should be allowed to influence the cell's state
Output gate-this gate determines what part of the cell's output should be allowed to flow out-typically to be consumed as a prediction A "cell" is a word used interchangeably for an individual LSTM.

810 questions

votes

2 answers

LSTM : shape of tensors?

I'm trying to understand LSTM, using for instance http://colah.github.io/posts/2015-08-Understanding-LSTMs/ I get the overall idea, I guess. But I'm not quite sure I get the maths. I'll set a very simple problem : I have a sequence of numbers and…

lstm

asked May 19 '17 at 08:57

user3617487

votes

1 answer

Why would an LSTM converge to a fixed state when generating sequences?

I want to generate some sequences using LSTMs, like in the char-rnn from karpathy, I do nearly everything identical. For that during training the network greatly decreases the error over time, until it converges in some local minima of the error…

lstm

asked Dec 31 '16 at 15:11

user143877

votes

0 answers

Do we get the best performance with "batch_size = 1"(especially for LSTM)?

In my experience, choosing batch_size = 1 gives the best result and choosing the batch_size = whole data number gives the worst. And seems there is a linear or exponential relation between these two numbers to choose(I mean choosing a number nearer…

lstm

asked Feb 28 '19 at 09:23

user3486308

votes

1 answer

Does each gate have one, or two matrices in LSTM?

According to these equations in wikimedia: each gate has two weight matrices:W,U respectively,but according this: from:http://colah.github.io/posts/2015-08-Understanding-LSTMs/ each gate has only one weight matrix:W,and in the example code: z =…

lstm

asked Dec 05 '17 at 08:45

Alex Luya

votes

2 answers

LSTM with multidimensional input

It's hard to find literature where LTSM are used with multidimensional input. I know that LTSM admits various time series as input (multidimensional input) with the shape : (samples,look back,dimension).Dimension could be, electricity demand,…

lstm

asked Jul 28 '17 at 10:38

J.Cirera

vote

1 answer

What could cause my LSTM loss to decrease then increase?

I'm training 5 stacked Bi-LSTMs on an NLP task. The network fits well with a time series of length 30, and converges to around 0.97 AUROC. However, when I increase the length of the time series to 50, this happens: I'm not using masking (slows…

lstm

asked Feb 16 '21 at 15:06

Noel Kennedy

vote

0 answers

LSTM Vanishing Gradients

I'm trying to implement the LSTM model for text classification where each sentence is about 1500 words. converted sentence to a sequence of values and fed to LSTM but gradients are becoming zero. I'm unable to fix it. why LSTM is facing a vanishing…

lstm

asked Dec 19 '20 at 14:52

SS Varshini

vote

0 answers

LSTM vector sum of inputs as memory

I want to set up a long short-term memory (LSTM) network to have the vector sum of the inputs, which live in R^d, as its memory (ct). what is the required choices of the weight, and activation functions to do this?

lstm

asked Jul 18 '20 at 12:51

harry

vote

1 answer

User behavior prediction using LSTMs

Let a user U with 3 possible states: A, B and C. From A you can go everywhere (including A), from B or C you can only go to A. LSTM are a good to model Markov-problems with an extra notion of long term memory across steps. But if I understand right,…

lstm

asked Sep 19 '18 at 07:14

0xmax

vote

1 answer

Readout in RNNs (LSTM)

How is the readout functions in LSTMs? The output of the last layer and timestep t is transmitted to the first layer and timestep t+1 or also the cell state of the last layer and timestep t? Any sources for the readout in LSTMs?

lstm

asked Nov 02 '16 at 15:55

Xxxo

vote

1 answer

LSTM Classifying all the words as the same class

I've used Lasagne to build a LSTM model to classify words with the IOB-tags. About 25-40% of the training words classes is O, thus receiving the same int32 class number 126. The words go through a context window method, in order to increase the…

lstm

asked Nov 19 '15 at 04:54

Lucas Azevedo

votes

0 answers

Does an LSTM make sense in this context?

I have a continious variable I wish to predict, for which I have sensory data on a fixed time interval t, denoted as y_t. In addition, I have a set of features which are consistent across time, called f_t, but also time itself is a feature, which…

lstm

asked Jul 18 '23 at 10:05

Jigeli

votes

0 answers

What to do if my LSTM model doesn't learn

I have taken text input then converted to a sequence of values and fed it to LSTM model where my loss is not reducing and accuracy is abnormal. The above image is about training and validation accuracy. Here I have taken 10 epochs but even for 100…

lstm

asked Dec 17 '20 at 10:43

SS Varshini