1

I'm solving a classification task on a time-series dataset.
I use a Transformer encoder with learned positional encoding in the form of a matrix of shape $\mathbb{R}^{seq \times embedding}$.
Naturally, this leads to the fact that the sequence length that the model can process becomes fixed. I had an idea to do learned positional encoding with LSTM.
I.e., we project a sequence of tokens with a linear layer onto an embedding dimension, then feed the embeddings to LSTM layer and then add hidden states to the embedding.
$x = MLP(x)$
$x = x + LSTM(x)$

Do you think this will have the right effect?
Are there any things to consider?

debrises
  • 13
  • 3

1 Answers1

0

At first sight, it should have the right effect. However, LSTM has limits and it cannot process any kind of timeseries with any size properly.

For instance, if the inputs are too small (sequence length <20), classic RNN might be even better than LSTM (sequence length ~250-500), depending also on the variability of your data.

That's why, the different inputs shall be comparable to have a good prediction, and data scaling could be therefore necessary.

I suggest you to study the LSTM paper to have a good understanding about its limits and how the data is processed.

Otherwise, I would need more information to be more specific about potential solutions.

Note: time series classification can also be done with dimensional reduction algorithms.

Nicolas Martin
  • 4,674
  • 1
  • 6
  • 15
  • What do you mean by "data scaling"? Thanks for the answer. – debrises Jun 12 '22 at 21:39
  • It could mean data normalization or data modification to fit fix input length. For instance, if you have [23,10,26,32] and [50,68], you will want to change it to [23,10,26,32] and [0,0,50,68] (or using points extrapolation). It depends on your business needs. – Nicolas Martin Jun 13 '22 at 07:05