2

According to these equations in wikimedia:

LSTM equations from wikimedia

each gate has two weight matrices:W,U respectively,but according this: forget gate equation from:http://colah.github.io/posts/2015-08-Understanding-LSTMs/

each gate has only one weight matrix:W,and in the example code:

z = np.row_stack((h_prev, x))
f = sigmoid(np.dot(W_f, z) + b_f)
i = sigmoid(np.dot(W_i, z) + b_i)

from:http://blog.varunajayasiri.com/numpy_lstm.html,apparently:

input and previous hidden status are stacked together to multiply same weight matrix

So,

  1. Does each gate have one, or two matrices in LSTM? or I don't understand them correctly

  2. if two,what does U matrix(wikimedia) mean?

Alex Luya
  • 123
  • 4

1 Answers1

4

The area the same thing.

$W\cdot [a; b] = U\cdot a + V\cdot b$
if $W=[U\; V]$

For $[a;b]$ being the vertical concatenation of the vectors (considered as column matrices) and $[U\; V]$ the horizontal concatenation of the matrices respectively.

The difference is purely notation.

Consider how matrix multiplication works. or we are looking at it as a block matrix

$$[U\; V]\cdot [a;b] = U\cdot a + V\cdot b $$