When it comes to sequence to sequence processing (seq2seq), recurrent neural networks (RNN) have for several years been the go-to neural network architecture for natural language processing.
However, artificial intelligence is a quickly changing landscape, and with Google's landmark paper “Attention Is You You Need", RNN is quickly becoming an architecture of the past and being replaced by the Transformer architecture.
So what are Transformers? Without diving too far down the technical rabbit hole, Transformers are a new type of sequence processing network that utilize encoders and decoders to selectively pay attention to key points of a sequence.
In a traditional RNN and their more favored form, the Long Short Term Memory (LSTM) network, sequence to sequence processing suffers from the "long distance" problem.
The long distance problem simply means that the network starts to forget points in time in the distant past, so if we have a sentence:
"LSTM can often suffer from the long distance problem! Transformers can help..."
A LSTM network would give heavier weight to the end of the phrase "Transformers can help..." than to the front of the phrase "LSTM can often" when it come to predicting the next words in the sequence. This isn't a problem for short sentences, but for longer paragraphs, the problem is much more apparent in the forward predictions.
Transformers solve this by encoding whole sequences using Self-Attention. This means that a Transformer will take in a whole corpus of text and encode key points of it, paying attention to key words that help in predicting the next sequence.
For example, before passing the sentence into the feed-forward part of the network, a Transformer may encode "LSTM", "suffer", "problem", "Transformers" and "can help"…
This helps Transformers overcome the long distance problem be giving emphasis to key points of a sequence regardless of where they occur, allowing the network to better potentially predict "solve this." as a viable next sequence prediction.
"LSTM can often suffer from the long distance problem! Transformers can help [solve this.]"
We've put together a great demo notebook that will allow you to play around with a pre-trained Transformer for English to Spanish translation! This pre-trained network comes compliments of Hugging Face and the great work they've been doing over there.
You can access the Python notebook here, just click 'Runtime -> Run All' and give the notebook a couple minutes to download the dependencies. You can also change the sentence to be translated from English by changing the sentence in the data array.
We use Shapley Additive Explanations to help show the heatmap of what the Transformer is paying attention to most while translating each word, so you can see exactly how the self-attention mechanism is working under the hood!
Tags: rnn, transformers, artificial intelligence, machine learning, new technology, google