Correlation vs. Causation: What’s the Difference?

Written by Coursera Staff • Updated on

Learn about correlation versus causation and how to differentiate these terms from one another when describing the relationship between variables.

[Featured text] A woman and man review data looking for evidence of correlation vs causation.

In analytics, correlation and causation both describe relationships between variables. However, the two terms are not interchangeable and have significant differences. Causation indicates that one event causes another, while correlation only identifies that a relationship exists between two events or outcomes.

When two variables respond similarly to an event, you may assume that one event caused the other or that the two are directly connected. However, this isn’t always the case, making it important to be able to distinguish between correlation and causation. 

What is meant by correlation vs. causation?

The concept of correlation versus causation strives to determine if two events are simply related or if one caused the other to happen. Correlation versus causation is an important consideration since the presence of a correlation between two variables doesn’t mean one causes the other. When a clear relationship exists between variables, it can be easy to say that a cause-and-effect relationship is present.

The problem with making this observation is that you may fail to consider other factors or variables that could cause the correlation. The correlation you observe may be causation, as both can be true, but correlation alone isn’t enough to declare causation. 

What is correlation?

Correlation measures the linear relationship between variables. In a positive correlation, when the value of one variable goes up, the other does as well. When one variable goes down, the other variable descends, too.

A negative correlation describes the opposite—one variable goes up, and the other goes down, with the two variables moving in opposite directions. If no relationship exists between variables, you would say the correlation is zero.

You can represent the strength of the relationship between variables using a correlation coefficient ranging from -1 to +1, where the closer the linear relationship is to zero, the weaker the correlation is:

  • 1 = Perfect positive correlation

  • 0.5 = Weak positive correlation

  • 0 = Zero correlation

  • -0.5 = Weak negative correlation

  • -1 = Perfect negative correlation

You can also use scatter plots to visualise correlations. If you have a positive correlation, you will notice points on the scatter plot moving up from left to right and down from left to right if a negative correlation is present. A scatter plot representing variables with no correlation will have points that appear spread throughout the graph. 

Limitations exist regarding how much you can learn from correlations, as correlation alone isn’t enough to prove causation. Additionally, correlations are only able to establish linear relationships between variables. 

Even when variables are strongly correlated, it doesn’t prove a change in one variable caused the change in the other. To be able to do that, you must establish causation. Causation occurs when one variable is directly responsible for the change in the other. This is much more difficult to prove than correlation and requires experimentation using both independent and controlled variables. 

What is causation?

Causation occurs when one variable is directly responsible for the change in the other. In other words, a change in one variable causes a change in another variable. Proving this relationship tends to be more difficult than correlation and requires experimentation using both independent and controlled variables. 

To prove causation, you need a properly designed experiment that demonstrates these three conditions: 

  • Temporal sequencing: Temporal sequencing states that X, referring to the variable causing the change, comes before Y, the variable that changes.  

  • Non-spurious relationship: A non-spurious relationship means that you can demonstrate with certainty that the relationship between X and Y couldn’t occur simply by chance.

  • Elimination of alternative causes: By eliminating alternative causes, you are stating that the relationship between X and Y isn’t due to other outside variables that aren’t considered part of the experiment. 

Does correlation imply causation?

Although it’s possible for both correlation and causation to occur at the same time, correlation doesn’t imply causation. This is because the relationship between variables could either be due to a third variable or simply a coincidence. 

Examples of correlation vs. causation

If you were to collect data on the sale of ice cream cones and swimming pools throughout the year, you would likely find a strong positive correlation between the two as sales of both increases during the summer months. If you make the mistake of assuming correlation implies causation, you would incorrectly claim that an increase in ice cream cone sales causes people to buy swimming pools. However, this isn’t the case since you can attribute the increase in both to another variable—likely the warmer weather people experience during the summer. So, although a correlation is present, you can't support causation. 

In another correlation versus causation example, it may not be as easy to identify whether causation is present with two variables. For example, you could find a correlation between the amount someone exercises and their reported happiness levels. While an increase in exercise may be causing an increase in happiness, you can't say for sure that it’s the cause since there could be another unknown variable that significantly influences a person's mood.

Reliable ways to determine causation

To reliably determine causation, you can perform randomised A/B/n testing, which is the same as an A/B test, but with any number of additional variables. This ensures that other possible factors are part of the test as well. 

The other method for determining causation is through hypothesis testing. Hypothesis testing is when you test your primary hypothesis against a null hypothesis, which is the opposite of your primary hypothesis. Your primary hypothesis should disprove the null hypothesis to help you be as certain as possible about your results. 

Learn more with Coursera 

In analytics, distinguishing between correlation and causation is crucial because correlation only indicates a relationship between variables, while causation confirms that one variable directly influences the other. Establishing causation requires rigorous experimentation to prove that one event leads to another, eliminating the possibility of other influencing factors.

Consider earning a Google Data Analytics Professional Certificate on Coursera to develop important analytical skills, such as data collection, calculations, and analysis. With this certificate, you can qualify for in-demand positions in less than six months, such as a data analyst or junior data analyst. 

[Entity card: https://www.coursera.org/professional-certificates/google-data-analytics]

The University of Colorado Boulder’s Statistical Inference and Hypothesis Testing in Data Science Applications and Data Analysis Tools from Wesleyan University on Coursera are also great courses to learn more about how you can properly implement hypothesis testing.

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.