Pointer networks are a variation of the sequence-to-sequence model with attention. Instead of translating one sequence into another, they yield a succession of pointers to the elements of the input series. The most basic use of this is ordering the elements of a variable-length sequence or set.
Basic seq2seq is an LSTM encoder coupled with an LSTM decoder. It’s most often heard of in the context of machine translation: given a sentence in one language, the encoder turns it into a fixed-size representation. Decoder transforms this into a sentence again, possibly of different length than the source. For example, “como estas?” – two words – would be translated to “how are you?” – Three words.
The model gives better results when augmented with attention. Practically it means that the decoder can look back and forth over input. Specifically, it has access to encoder states from each step, not just the last one. Consider how it may help with Spanish, in which adjectives go before nouns: “neural network” becomes “red neuronal”.
In technical terms, attention (at least this particular kind, content-based attention) boils down to weighted averages. In short, a weighted average of encoder states becomes the decoder state. Attention is just the distribution of weights.
Here’s more on seq2seq and attention in Keras.
In pointer networks, attention is even simpler: instead of weighing input elements, it points at them probabilistically. In effect, you get a permutation of inputs. Refer to the paper for details and equations.
Note that one doesn’t need to use all the pointers. For example, given a piece of text, a network could mark an excerpt by pointing at two elements: where it starts, and where it ends.