- Provides a general end-to-end approach to sequence learning.
- Introduces a new class of "Encoder-Decoder" architectures for the seq2seq learning.
- A straightforward LSTM architecture can be used to solve general sequence to sequence problems.
- The idea to is use one LSTM to read the input sequence, to obtain some sort of "encoding".
- This encoding is then used by another LSTM to extract the output sequence, this can be thought of as the decoder, which is basically a rnn language model which is conditioned on the input sequence.
- LSTMs are used because they can handle long range temporal dependencies and also can map variable length inputs to fixed dimensional vector representation.
- The inputs are given in the reverse order to create short term dependencies and a special end of sentence symbol is needed.
- This architecture performed state of the art performance in 2014 on the english to french translation task by WMT, The LSTM did not suffer on longer sentences.
- Introduces Seq2Seq Learning using LSTMs to Deep Learning.
- The ability to turn sentences into a fixed-dim vector helps in visual representation and see if the model is able to "learn" semantic properties.
- Interpretability of the output of the encoder, How can this "fixed dimensional vector" be interpreted and can this be thought of as a sentence embedding?
- There is no clear explanation to reversal of input.