What Is Linear Regression? (Types, Examples, Careers)

Written by Coursera Staff • Updated on

Linear regression is a very common statistical technique used in industries such as medicine, sports, environmental science, and finance. Explore what linear regression is, why many professionals benefit from this method, and how it may be useful for you.

[Featured Images] Two business professionals look at a laptop and tablet and discuss data plotted with linear regression.

Key Takeaways

Professionals use linear regression across many industries to make predictions, inform business decisions, prepare for upcoming events, and explore answers to research questions. Here are some important facts to know:

  • When you perform a regression analysis, your regression equation provides a way to predict future outcomes based on the information you currently have.

  • In linear regression, you’re trying to find the “best fit” line to represent the relationship between your variables.

  • You can perform linear regression by hand or with the help of statistical software.

You can delve into the intricacies of linear regression throughout this article, including its definition, various types of linear regression, and how different careers utilize this statistical tool. As a beginner, you can start with Linear Regression and Modeling by Duke University.

What is regression analysis?

Regression analysis is a statistical method that allows data analysts to understand the relationship between two or more variables. Before diving into linear regression, it’s important to understand a few key definitions:

  • Dependent variable: The dependent variable, or response variable, is the variable you’re interested in understanding or predicting. For instance, it could be something like the score a student gets on a test.

  • Independent variables: The independent variables, or explanatory variables, are variables that you think might affect your dependent variable. In the above example, these factors could include the number of hours the student studied, their prior knowledge, the number of hours they slept, and so on.

  • Regression equation: The regression equation is the formula that tries to express how your independent variables (like studying, sleep, etc.) relate to your dependent variable (the test score). 

When you perform a regression analysis, your regression equation provides a way to predict future outcomes based on the information you currently have. For instance, if you had data on how much previous students studied, slept, and scored on their test, you could perform a regression analysis to create an equation that predicts a future student’s test score based on how much they studied and slept. As you gather more data, you can continue to update your equation to enhance its validity and accuracy of findings.

What is linear regression? 

Linear regression is a specific type of regression analysis that you use when you expect a clear, straight-line relationship between your independent and dependent variables. This is where the term “linear” in linear regression comes from. You describe the straight line by an equation: Y = aX + b.

  • Y is the dependent variable.

  • X is the independent variable.

  • ‘b’ is the y-intercept, or where the line crosses the y-axis.

  • ‘a’ is the slope of the line, which indicates how much Y changes when X changes.

In linear regression, you’re trying to find the “best fit” line to represent the relationship between your variables. The idea of accuracy here typically refers to the line where the total distance between the line and all your data points (both above and below the line) is minimized. This is the “least squares” model. Once you have your “best fit” line, you can use it to make predictions. 

Types of linear regression

In linear regression, you can have one or multiple independent variables. If you only have one, it’s called “simple linear regression.” If you have more than one, it’s called “multiple linear regression.” The more variables you include, the more complex your equation becomes, but the basic idea is the same.

1. Simple linear regression

Simple linear regression is the most basic form of linear regression, involving only one independent variable and one dependent variable. For example, imagine you’re studying the relationship between the number of hours someone exercises per week (independent variable) and their blood pressure (dependent variable).

In simple linear regression, you would model this relationship using the equation Y = a + bX, where:

  • Y is the dependent variable (blood pressure).

  • X is the independent variable (hours exercised).

  • a is the y-intercept (blood pressure with zero hours exercised).

  • b is the slope (how much the blood pressure reading changes for each additional hour exercised).

Simple linear regression aims to find the best values for ‘a’ and ‘b’ to make the line of best fit. This line helps us predict the dependent variable (blood pressure) based on the independent variable (hours exercised).

2. Multiple linear regression

Multiple linear regression is a direct extension of simple linear regression and is used when more than one independent variable is present. Using the same study example, consider both the number of hours exercised and the number of hours slept each night before the blood pressure reading. Now you have two independent variables, so you’re dealing with multiple linear regression.

In this case, the equation would look something like this: Y = a + b1(X1) + b2(X2). In this equation:

  • Y is still the dependent variable (blood pressure).

  • X1 and X2 are the independent variables (hours exercised and hours slept).

  • a is the y-intercept (the blood pressure reading with no exercise or sleep hours).

  • b1 and b2 are the slopes (how much the blood pressure reading changes for each additional hour exercised and each additional hour slept, respectively).

In multiple linear regression, the objective remains the same: to find the best values for ‘a’, ‘b1’, and ‘b2’ that create the best fit for the data. This allows us to predict the test score based on both hours studied and hours slept.

When building your model, you will often have to make choices on which variables to include. As you might guess, the resulting model will vary depending on which variables you include, which is why it is important to think carefully about your model. 

Examples of linear regression 

Linear regression has applications in almost every field. Some ways you might see linear regression in different industries include:

  • Politics: The relationship between state spending and public support

  • Business: The relationship between revenue and employee pay

  • Environment: The relationship between carbon emissions and taxes

  • Sociology: The relationship between professional pay and applicant qualifications

  • Psychology: The relationship between culture and inclusive behaviors

  • Health: The relationship between patient demographics and body weight

  • Education: The relationship between academic grades and geographic location

How to perform linear regression

You can perform linear regression by hand or with the help of statistical software. In general, linear regression is most effectively performed with the help of computer software. This software can perform both simple and multiple linear regression, producing different models with various variable combinations. Some software and programming languages you might consider using for linear regression include R, scikit-learn, MATLAB, Python, Stata, and Excel.

[Video thumbnail] Unlock data wrangling!

Pros and cons of linear regression

When you choose to use linear regression, being aware of the advantages and disadvantages of this method may help you determine when it is appropriate to use and interpret your findings more accurately. Linear regression is a powerful statistical tool, and you may find several advantages to using this method. 

Advantages

Some advantages you might find include the following:

  • Ease of use: Linear regression is generally considered a straightforward and manageable algorithm that can be applied to various types of computational systems.

  • Simplicity and efficiency: The underlying linear regression technique is relatively simple to understand compared to other machine-learning techniques

  • Modeling linear relationships: Linear regression can effectively model linearly separable datasets, making it useful for determining relationships between variables.

  • Making informed insights: With linear regression, you can use your data to explore the relationships between different variables and make predictions based on different values. This helps to inform decision-making, such as optimizing a marketing strategy or allocating the right volume of resources for a project.

Limitations

While powerful when used correctly, linear regression is not appropriate for every use case. By being aware of limitations, you can more effectively determine when this algorithm is the right choice for you. Some of the limitations you might encounter include:

  • Causation vs. correlation: Regression analysis only shows correlation, not causation. Just because two things seem to move together doesn't mean one is directly affecting the other. There might be other hidden factors at play, or it could be a coincidence. It’s always important to use other forms of research and critical thinking to back up your findings from regression analysis.

  • Risk of underfitting: Linear regression may lead to underfitting, which occurs when the machine learning model fails to represent the data accurately.

  • Restricted to linear relationships: When measuring the relationship between naturally occurring variables, the underlying shape may be nonlinear. Since linear regression assumes a linear relationship between the input and output variables, this type of analysis may fail to accurately fit complex datasets.

  • Sensitivity to outliers: Outliers, or extreme values, can significantly impact linear regression by pulling the line of best fit toward them. This may result in models that do not accurately represent the data.

What careers use linear regression? 

Linear regression is a very common type of statistical technique, which makes it a popular tool in many professions making insights from their data. Some careers that use linear regression include:

  • Sports analysts: Sports analysts can use linear regression to predict how certain players or teams will perform based on previous seasons.

  • Marketing analysts: Marketing teams can examine the performance of previous products or campaigns to make informed predictions about future ones.

  • Financial analysts: Financial analysts can forecast how stocks or investments will perform based on a wide variety of factors.

  • Environmentalists: Environmentalists can predict pollution, emissions, and other environmental data based on previous years’ environmental data.

Learn more with our free resources.

Exploring a career in data analysis or data science? Stay updated on the latest career trends with our LinkedIn newsletter, Career Chat! Or, browse our other free resources:

Whether you want to develop a new skill, get comfortable with an in-demand technology, or advance your abilities, keep growing with a Coursera Plus subscription. You’ll get access to over 10,000 flexible courses.

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.