Bias vs. Variance in Machine Learning: What’s the Difference?

Written by Coursera Staff • Updated on

Bias and variance are both prediction errors in machine learning. Learn more about the tradeoffs associated with minimizing bias and variance in machine learning.

[Featured image] A machine learning engineer sits at his office computer and researches bias versus variance in machine learning.

When it comes to machine learning, there are several mechanisms that allow models to predict trends, patterns, and insights. It’s understandable, then, that prediction errors occur. Bias and variance are prediction errors that happen in machine learning, and there is a tradeoff when trying to minimize the two. 

Understanding the errors helps machine learning experts build better, more accurate models while avoiding overfitting and underfitting. This article will review each error and what happens when attempting to minimize bias and variance in machine learning models.

Bias vs. variance in machine learning: what’s the difference?

The key difference between bias and variance in machine learning is that bias occurs when the data leads to wrong assumptions. In contrast, variance occurs when there is high sensitivity to variation in the training data. Because data aims to be as accurate as possible, these errors can lead to potential wrongdoing when left unaddressed. 

What is bias in machine learning?

Bias in machine learning is an error that skews an algorithm’s result, either in favor of or against a particular concept in the data. It occurs due to an incorrect assumption made during the process. Bias exists between the model’s prediction and what is true. 

A machine learning model with high bias will not match the data set as closely. That means it might not be able to capture data trends, tend to be too simplified, and have a high rate of error.

What is variance in machine learning?

Variance in machine learning is an error that occurs when the model’s prediction changes as its trained on different subsets of the training data. It happens when very complex machine learning models have many features. Complicated models will have more variance.

When a machine learning model has high variance, it might be due to having “noise” in the data set (or data within the set that is corrupted, meaningless, or irrelevant). It may also happen when an engineer tries to place the data points too close to each other.

The bias-variance tradeoff explained 

The bias-variance tradeoff refers to the inverse relationship between bias and variance in a machine learning model and finding the right balance between each. It deals with how complex a model is, how accurate its predictions are, and how well the model makes predictions on data not used to train it. Generally, it’s desirable to have low error or bias, which happens when the model has more flexible parameters. The tradeoff is that flexible models have more variance each time samples are used to create new training sets, so a balance needs to be found. 

Models with high bias have low variance, and models with low bias have high variance.

Underfitting and overfitting explained

Machine learning models with high variance and low bias accurately represent the data set but are prone to overfitting. 

Overfitting refers to instances in which the model attempts to match non-existing data in its predictions. It happens when, in highly complex models, the model matches the data points but follows the training data sets too closely. However, in these cases, the model cannot pinpoint the same data point in new, unseen training sets and is unable to predict an accurate outcome.

Models with high bias and low variance are prone to underfitting the training data set to a simpler machine learning model that can’t register complexities in the data.

Underfitting refers to when the model cannot match all relevant points in the inputted data to the target data. It happens when the model isn’t complex enough to match the target data and is not following the training data closely enough due to its high bias.

In short, high bias leads to underfitting, and high variance leads to overfitting. Underfitting (bias) is negative because you don’t achieve training or target data accuracy. Overfitting (variance) lends to variations in the future and the inability to make accurate predictions on new data.

Start advancing your machine learning skills today

Learning the nuances of error in machine learning is an important skill. If you want to amplify your career in machine learning, consider enrolling in the Machine Learning Specialization offered by a partnership between Stanford University and DeepLearning.AI. Start your machine learning path and advance your skills with a free 7-day trial of a Coursera Plus subscription. 

Coursera Plus
Build job-ready skills with a Coursera Plus subscription
  • Get access to 10,000+ learning programs from world-class universities and companies, including Google, Yale, Salesforce, and more
  • Try different courses and find your best fit at no additional cost
  • Earn certificates for learning programs you complete
  • A subscription price of $59/month, cancel anytime

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.