One of the shortfalls of the Recurrent Neural Network (RNN) is that of creating models to solve problems with long term dependencies.
RNN tends to forget information, reference & context which make it unsuitable for such problems.
RNNs are good at handling sequential data but they run into problems when the context is far away.
Example: I live in France for the last 20 years and I know ____. The answer must be ‘French’ here but if there are some more words in-between ‘I live in France for the last 20 years and work as a consultant for the last 10 years ‘I know ____’. It’ll be difficult for RNNs to predict ‘French’ as it gets lost in multiple contexts.
Vanishing Gradient is a problem with RNN where even a small change in parameters in the input brings huge changes at the output. Unlike RNN which remembers or forgets information in bulk, LSTM does it selectively using a mechanism called “cell states”. “Sequence Prediction Problems” can be really complex at times and LSTM can selectively remember patterns for a longer time.
It works like a ‘conveyor belt’ where information can be added, deleted, or modified as it moves between different layers. This solves the vanishing gradient problem.
LSTM so far is the most successful RNN for solving complex problems. What makes it a successful Artificial Neural Network topology, is its resemblance in functioning like that of the human brain by selectively remembering or forgetting information based on the context.
It has also been proven very effective in areas like NLP and drug discovery wherein it trains through a sequence of data to predict the most suitable output. For product design replace that dataset of millions of phrases/ sentences with a database of millions of characteristics/formulae of chemicals/ metals/ polymers etc. LSTM will predict the most suitable alloy/material. Replace millions of circuit/design it will predict the most suitable architecture.
The possibilities are endless.