Learn what an LSTM neural network is, how it works, the benefits and limitations compared to other kinds of neural networks, common uses, and specific industry applications.
An LSTM neural network, or long short-term memory, is a type of recurrent neural network that can remember information for a long time and apply that stored data for future calculations. First proposed in 1997, an LSTM network is a deep learning algorithm that overcomes some of the problems recurrent neural networks face, including those associated with memory storage.
LSTM neural networks can be used for language translation, video analysis, keyword spotting, text-to-speech translation, and language modeling. This article explains how LSTM neural networks work and outlines how to start a career in the field.
This specialized type of recurrent network can remember stored data for longer. LSTM neural networks also overcome a recurrent issue often experienced in traditional recurrent neural networks (RNNs) called gradient dispersion, sometimes called the vanishing gradient problem.
To understand how a long short-term memory neural network functions, it helps to first learn a bit about RNNs in general. Recurrent neural networks remember the results of previous inputs and can use past trends to inform current calculations.
For example, you can use neural networks for language translation because the neural network can learn how word placement within a sentence affects meaning in different languages. “The black cat” follows English grammar rules, but to translate it to Spanish, you will need to swap the placement of the noun and the adjective: “El gato negro.” A recurrent neural network can remember these details and provide you with an accurate translation.
In addition to offering more robust memory, LSTM networks also ignore useless data to overcome the vanishing gradient problem experienced with traditional RNNs. To “remember” the data, a traditional RNN uses backpropagation to feed the output back through the neural network and update the weights of each layer accordingly, starting with the output layer and working backward to the input.
In time, the gradient, or difference between what the weight was and what the weight will be, becomes smaller and smaller. This causes problems that can prevent the neural network problems from implementing changes or making very minimal changes, especially in the first few layers of the network. Ultimately, this causes the network to slow its rate of learning way down and may even stop learning entirely.
An LSTM neural network can overcome gradient dispersion by adding functionality in the form of gates: an input gate, an output gate, and a forget gate. These allow the neural network a much longer memory and greater control over how previous experience affects the network weights. In other words, an LSTM neural network can outperform a traditional RNN because it can control its own learning enough to avoid the gradients becoming too large or too small.
Users often choose to use an LSTM neural network to work with sequential or time series data. But long short-term memory neural networks are also flexible, and you can adapt them to many different training models. A few uses for LSTM neural networks include:
Speech-to-text translation: LSTM neural networks can transcribe spoken words into written speech.
Language translation: LSTM neural networks can quickly translate large amounts of text from one language to another.
Sentiment analysis: LSTM networks can analyze the emotion behind text. For example, LSTM networks can look at mentions of a brand on social media to understand how people are feeling about the brand.
Video analysis: LSTM neural networks can analyze videos and images to understand the features present.
Keyword spotting: An LSTM neural network can recognize spoken “keywords” to wake up a system. For example, you might speak a particular phrase before issuing a command to your voice assistant.
Long short-term memory networks can offer benefits in industries as diverse as drilling, water management, supply chains, and infectious disease prediction. Let’s look at these use cases for this technology more closely.
Drilling: Drilling operations, such as those set up in oilfields, use LSTM neural networks to optimize the penetration rate or how fast the drilling equipment penetrates the earth. Variables like hydraulics, lithology, and the amount of weight pressing down on the bit will impact the drilling rate. A neural network can consider all of these variables and present an optimized model for the best penetration rate.
Water management: Evaporation is an important factor that meteorologists and other scientists need to consider for water resource management. Scientists can use LSTM neural networks to project and model evaporation, leading to more precise models. However, correctly gauging evaporation can be difficult for a variety of reasons, including weather conditions and the difficulty of equipment management.
Supply chain management: LSTM neural networks can detect anomalies to determine data that should be disregarded when making decisions. For example, they can reduce decision time in industries like supply chain management, where a lot of data continuously flows in, but not all of it is relevant and actionable.
Infectious disease modeling: Scientists used LSTM neural networks to create models to demonstrate the spread of COVID-19. Hospitals and governments used these models to prepare for surging case numbers.
Long short-term memory neural networks offer flexibility, improved memory performance, and the ability to overcome problems associated with gradient dispersion. A few of the other benefits of LSTM networks are:
Long-term dependency pro: LSTMs are better at managing long-term dependencies because the forget gate allows the network to discard irrelevant information.
More efficient: LSTM neural networks have to update their gates for each new input, but a typical RNN would update every node throughout, taking a lot more time and computational power.
High accuracy for prediction: The ability to recall the past gives the neural network more data to make more accurate predictions.
Yet, long short-term memory networks also have limitations that you should be aware of. For example, they are prone to overfitting, another common neural network problem. This happens when the neural network specializes too closely in the training data and cannot adapt and generalize to new inputs.
Another challenge with LSTM neural networks is that they require more computational power and memory. This is partly due to their complexity. Several strategies can help you overcome this problem, including intentionally keeping the complexity lower or using other technologies to supplement the neural network.
To continue the conversation, consider enrolling in a specialization to learn more and take your skills to the next level. The Deep Learning Specialization offered by Deep Learning.AI on Coursera is a five-course series that will help you learn more about artificial neural networks, including convolutional and recurrent networks. You will develop skills in working with RNNs, training test sets, and natural language processing.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.