The Tanh Activation Function in Deep Learning

Written by Coursera Staff • Updated on

Learn about the tanh activation function and the role it plays in machine learning and artificial intelligence. Explore features, limitations, and common applications.

[Featured Image] A businessperson using a tablet explains the tanh activation function to coworkers in a boardroom.

Key takeaways

The tanh activation function is a nonlinear function that can help accelerate neural network learning.

  • Activation functions in neural networks dictate how neurons within the model activate as information flows through.

  • Scenarios suitable for the tanh activation include working with long short-term memory models and encountering convergence issues. 

  • You can use various types of activation functions, such as sigmoid and softmax activation functions, depending on which best suits the needs of your model.

Discover the role of the tanh activation function in neural networks and deep learning. If you’re interested in learning more about deep learning, the Deep Learning Specialization from DeepLearning.AI will give you the opportunity to build and train neural networks and work with natural language processing and transformer models.

Deep learning and neural network basics

Deep learning is a type of machine learning that uses artificial neural networks to mimic how human brains operate. When you have a thought or make a decision, the neurons in your brain fire in a particular pattern. This firing pattern differs depending on your actions and thoughts. For example, when excited, your neurons will fire in a different pathway than if you are angry. This system of neural pathways allows humans to have complex thoughts, learn, and reason through different ideas, which has historically differentiated us from machines.

In recent years, advances in machine learning have made it possible to mimic this neural network to enable machines to learn and independently process information to make insights. This type of algorithm uses large data sets to recognize patterns, make predictions, and guide decision-making with insights. This has led to the use of this technology in fields such as text processing, image analysis, autonomous driving, speech recognition, and research.

The role of activation functions 

One key aspect of a neural network is that information can travel through several possible paths. To decide this path, an activation function decides whether a neuron (node) will “fire” or not. This allows neural networks to compute highly complex decisions by determining which neurons should activate, influencing how the signal flows from input to output. Without these functions, a neural network would essentially become a linear regression model incapable of learning complex or non-linear relationships in the data.

To do this, you have several options for your function, including the sigmoid activation function, ReLU activation function, softmax activation function, and tanh activation function. Deciding the right activation path to train your model with will depend on which method's advantages and limitations best suit your goals. 

Softmax activation function

The softmax activation function typically appears in the last layer of a neural network and converts output scores into probabilities for multi-class classification use cases, such as recommender systems and image recognition tasks.

What is tanh activation function?

The tanh, or hyperbolic tangent, activation function is a nonlinear function that outputs values in the range of (-1, 1). Mathematically, you can represent this function as: 

e^z - e^(-z) / e^z + e^(-z) 

The tanh activation function is similar to the sigmoid activation function, which is another activation function that uses a sigmoid curve (an S-shaped curve) to model distributions. One difference here is that the sigmoid activation function models the probability output ranging from (0,1), while the tanh function spans (-1,1). This differentiation is due to the tanh function being centered around zero, which can sometimes accelerate the learning of neural network models. To transform a sigmoid function into a tanh function, you can shift the function by a certain parameter value.

Hard tanh activation function

The hard tanh activation function uses a more limited sigmoid curve to reduce computational load and improve speed. If you are prioritizing cost, the hard tanh function might be a choice you should consider. However, this function saturates with input values above one, which may pose a problem depending on your data.

Limitations of tanh activation function

The tanh activation function has similar limitations to the sigmoid function, including trouble with backpropagation. Backpropagation is the process in deep learning where a model retraces its flow through the network to see where it can improve the output. This involves reweighting each node to alter activation pathways and involves taking a derivative at each step. 

Tanh functions do not always backpropagate effectively when the value of the derivative reaches zero prematurely (known as the vanishing gradient problem), which stops the network's learning process. However, the tanh model struggles less with vanishing gradients when combined with ReLU models to create the ReLTanh activation function.

When to use tanh activation function

The tanh activation function is particularly suited for several applications. It may particularly suit your needs if:

You have zero-centered data.

Since tanh outputs range from negative one to one (-1, 1) and are zero-centered, you might choose to use them with layers where having data centered around zero can accelerate learning.

You want to minimize the risk of a vanishing gradient. 

Though you can still encounter the vanishing gradient problem with tanh functions in very deep networks, it is generally preferred over the sigmoid function in the hidden layers due to its efficiency in propagating gradients. The derivative of the tanh function is 1 - tanh^2(z), which means the gradient will be between zero and one (0, 1). The tanh function’s gradient generally stays larger than the gradient of the sigmoid function, making it less prone to this issue.

You want your model to converge quickly.

If you are using a sigmoid activation function and experience convergence issues, the tanh function might provide a solution to your problem. Tanh activation functions produce similar results to sigmoid functions while converging more quickly. This is because the output of tanh is zero-centered, meaning the weight updates have a directionally more consistent influence on the adjustments during training, often leading to faster convergence.

You are working with a long short-term memory (LSTM) network. 

This type of network is a type of recurrent neural network (RNN) designed to classify time series data while protecting against the vanishing gradient problem. Tanh models have typically been a go-to algorithm when working with LSTM models. Tanh activations have shown high accuracy with this network type in training and testing data. 

Read more: What Is an LSTM Neural Network?

Explore our free artificial intelligence resources

Subscribe to our weekly LinkedIn newsletter, Career Chat, for updates on popular tools and certifications, as well as skill-building resources. Then, check out some of our other resources to keep learning about AI. 

Whether you want to develop a new skill, get comfortable with an in-demand technology, or advance your abilities, keep growing with a Coursera Plus subscription. You’ll get access to over 10,000 flexible courses. 

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.