0

In CNN-LSTM/GRU based networks is it still possible to relate the weights of attention layer directly with inputs to visualize which part of the input does the model give more weightage?

Here I'm using CNN-GRU to predict next value of the 2 output (dependent) time series based on 1 input time series (similar to tutorial here). The window size (n_steps) is 80 and to use CNN we reshape the input like this:

# choose a number of time steps
n_steps = 80
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 40
n_steps_in_each_seq = 2
X = X.reshape((X.shape[0], n_seq, n_steps_in_each_seq, n_features))

Here is the my model definition:

ipt = Input(shape=(None, n_steps_in_each_seq , n_features ))
            x = TimeDistributed(Conv1D(filters=64, kernel_size=2, activation='relu'))(ipt)
            x = TimeDistributed(MaxPooling1D(pool_size=2))(x)
            x = TimeDistributed(Flatten())(x)
            x = Bidirectional(GRU(600, activation='relu', return_sequences=True))(x)
            att_layer, att_weights = SeqWeightedAttention(return_attention=True)(x)
            out= Dense(2)(att_layer)
            model = keras.models.Model(ipt, out)
            model.compile(optimizer='adam', loss='mse', metrics=['acc'])

The model summary goes like this:

Model: "model_1"
 Layer (type)                                          Output Shape                             Param #           
 input_1 (InputLayer)                                  (None, None, 40, 1)                             0                 
 time_distributed_1 (TimeDistributed)                  (None, None, 39, 64)                            192               
 time_distributed_2 (TimeDistributed)                  (None, None, 19, 64)                            0                 
 time_distributed_3 (TimeDistributed)                  (None, None, 1216)                              0                 
 bidirectional_1 (Bidirectional)                       (None, None, 1200)                              6541200           
 seq_weighted_attention_1 (SeqWeightedAttention)       [(None, 1200), (None, None)]                    1201              
 dense_1 (Dense)                                       (None, 2)        2402              
 Total params: 6,544,995 Trainable params: 6,544,995 Non-trainable params: 0

I want to relate the attention weights with the input window whose length is 80. However, the dimensions of attention do not match. And when I try to get the output of attention layer, 1) I dont get the attention layer output and weights separately also they are not in the correct dimensions to be matched with input. I get the weights as (explained here)

outs = get_layer_outputs(model, 'seq', X, 1)
outs_1 = outs[0][0] # additional index since using batch_shape
outs_2 = outs[1][0]

I'm not sure what am I doing wrong here. Hope someone can point that out?

Xichan
  • 95
  • 1
  • 11

0 Answers0