Dropout in Neural Networks: Enhancing Model Robustness

Written by Coursera Staff • Updated on

Explore the significance of dropout in neural networks and how it improves model generalization and other practical regularization applications in machine learning.

[Featured Image] Two machine learning specialists discuss dropout in neural networks as they look at a computer screen in a busy office setting.

Neural network dropout is a machine learning technique in which a developer drops certain nodes and their connections out of the learning framework at random in order to improve learning regularization. 

Modern machine learning occurs via neural networks—adaptation structures that resemble those of the human brain. Neural networks consist of an input layer, at least one hidden layer, and an output layer, each layer being composed of nodes. Nodes are artificial neurons that enable the transmission of data within a neural network. By means of these neural networks, machine learning and artificial intelligence (AI) models input information, adapt to it, make mistakes in the process of gaining a fuller understanding of it, and then predict informational outcomes in a human-like way. 

Dropout is one method of machine learning regularization—the process of reducing learning error rates. During dropout, nodes and their connections randomly “drop out” of the input and output layers of a neural network during the AI training process. Dropout is key to preventing overfitting, which is when a machine learning application absorbs too much noise and instills random and unhelpful fluctuations in data, leading to incorrect learning during the training process. A noise-filled, overfitted machine learning program learns to predict what it’s been taught but not to make new predictions based on that data. 

In short, dropout reduces the sort of machine learning overcomplication which, if not addressed, results in a model that hasn’t properly learned anything. 

Theoretical foundations of dropout

It’s worth considering dropout in terms of ensemble learning. Ensemble learning is the technique of combining two or more different machine learning modalities, ostensibly resulting in better predictive performance. It’s kind of a two-heads-are-better-than-one theory. 

The purpose of ensemble learning is to reduce error, which is described as the bias-variance tradeoff. This concept involves three elements: 

  • Bias: Bias signifies training errors describable as the gap between predicted and true values. The higher the bias in a machine learning algorithm, the less accurately a machine learning model can make predictions. You address bias via optimization, which increases an AI training model’s accuracy. 

  • Variance: Variance occurs when an AI model is too sensitive to small fluctuations in a training data set. In other words, a variance-heavy AI model picks up too much noise, thinking it’s the same as the meaningful data values you’re trying to train it on. This means it’s “thinking” based on error-ridden input data. Developers attempt to solve this problem via generalization, which refers to AI’s capacity to apply data it has already learned and output new, accurate information. 

  • Irreducible error: Large data sets contain an inherent amount of randomness, which results in further learning errors that can be difficult or impossible to reduce. 

What dropout does in terms of ensemble learning is reduce co-adaptation. Separate machine learning programs that are trained together on the same data may adapt in the same ways. These systems may adapt to accept statistical noise, which means that neither learns properly. Dropout is random, altering the hidden inputs of different machine learning models in the ensemble learning process, which results in each model developing different capabilities. This is similar to the way in which people with different educational backgrounds and experiences may try to solve the same problem together, each bringing his or her own discrete skills to develop a solution. 

Practical implementation of dropout

You can think of machine learning via neural networks as an enormously complex autocorrect feature. An algorithm doesn’t “learn” the way a human being does. A machine learning system learns by inputting massive amounts of data, regularizing the learning process, and refining its output capabilities. For example, an ML system completes a sentence in a human-like way by means of determining the relative statistical likelihood of one word being the most likely to follow from the previous one.

You can picture this probability-based learning modality as a decision tree. Decision trees describe the actions certain root nodes can take via decision nodes, which lead to leaf nodes. Decision tree branches are lines or arrows showing the flow of a choice from a root to a decision node. A branch is the program asking if the answer to a root node’s question is yes or no, resulting in separate leaf nodes for each answer. 

The relative likelihood of roots, decisions, and leaves being logically connected is weighted. Weight is the numerical value assigned to the connections between nodes in a neural network. The higher the weight value, the greater the statistical probability that two nodes connect if the predicate of one leads logically to an appropriate conclusion in another. You randomize weights at first, but as learning continues their values increase which results in greater accuracy in predictions. 

Dropout comes in when programmers don’t see the accuracy they want from a training model. You determine dropout rates via hyperparameters—parameters set prior to commencing the machine learning process. Certain hyperparameters to consider include: 

  • Machine learning rate

  • Momentum—how quickly an algorithm gets around noise to correct answers

  • Number of epochs—how many times the AI will go through the entire data set

  • Number of decision tree branches 

Here are some tips for implementing dropout: 

  • Start with a 20 percent dropout rate; don’t go higher than 50 percent

  • Go with a high learning rate momentum of between 0.9 and 0.99

  • Use a learning weight of 4 or 5

  • Utilize dropout on both the input and hidden layers

The PyTorch framework allows you to train your machine learning model in fault tolerance. This is the idea that a machine learning program can learn to detect, and even replace, faulty information picked up by nodes without shutting down and rebooting. 

Benefits of dropout in neural networks

Dropout removes data from a neural network to improve processing and ensure the network is not overloaded with data. This regularization method prevents overfitting at a low cost. Dropout improves: 

  • Image classification

  • Algorithm performance 

  • Speech recognition

Other regularization techniques

In order to optimize your machine learning performance, you may want to consider pairing dropout with other regularization techniques, such as: 

Data augmentation

During data augmentation, you increase the size of your data training set by working artificial data samples into the training process. By exposing your machine learning algorithm to a diversity of uncommon data in addition to your original data sets, you train it to adapt to data variations in a more sophisticated way. 

Early stopping

Early stopping allows you to pause the training process to eliminate irrelevant inputs.. Instead of allowing an automated learning program to continue unchecked, , which could lead to overfitting, early stopping halts training when machine learning performance starts to slip. 

Noise injection

Noise injection prevents overfitting by adding noise to input data during the training process. Normally, noise is bad, but by artificially adjusting weights, you can train your machine learning model to be relatively insensitive to noise. Think of it in terms of exposure therapy, you’re introduced to a phobia by degrees, and by degrees, you learn not to be afraid of it. 

Next steps with neural networks

Proper machine learning regularization leads to better predictive results. Coursera can help you take on the challenges inherent in machine learning training. You can learn more about dropout and other regularization techniques through Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization provided by DeepLearning.AI. From there, you can move on to Imperial College London’s Getting Started with TensorFlow 2. 

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Unlock unlimited learning and 10,000+ courses for $25/month, billed annually.

Advance in your career with recognized credentials across levels.