When you enroll in this course, you'll also be asked to select a specific program.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
There are 9 modules in this course
This course introduces the necessary concepts and common techniques for analyzing data. The primary emphasis is on the process of data analysis, including data preparation, descriptive analytics, model training, and result interpretation. The process starts with removing distractions and anomalies, followed by discovering insights, formulating propositions, validating evidence, and finally building professional-grade solutions. Following the process properly, regularly, and transparently brings credibility and increases the impact of the results.
This course will cover topics including Exploratory Data Analysis, Feature Screening, Segmentation, Association Rules, Nearest Neighbors, Clustering, Decision Tree, Linear Regression, Logistic Regression, and Performance Evaluation. Besides, this course will review statistical theory, matrix algebra, and computational techniques as necessary.
This course prepares students ready for and capable of the data preparation and analysis process. Besides developing Python codes for carrying out the process, students will learn to tune the software tools for the most efficient implementation and optimal performance. At the end of this course, students will have built their inventory of data analysis codes and their confidence in advocating their propositions to the business stakeholders.
Required Textbook: This course does not mandate any textbooks because the lecture notes are self-contained.
Optional Materials: A Practitioner's Guide to Machine Learning (abbreviated PGML for Reading)
Software Requirements: Python version 3.11 or above with the latest compatible versions of NumPy, SciPy, Pandas, Scikit-learn, and Statsmodels libraries.
To succeed in this course, learners should possess a basic knowledge of linear algebra and statistics, basic set theory and probability theory, and have basic Python and SQL skills. A few courses that can help equip you with the database knowledge needed for this course are: Introduction to Relational Databases, Relational Database Design, and Relational Database Implementation and Applications.
Welcome to Data Preparation and Analysis! Module 1 guides students through the art of crafting informative and visually appealing histograms, a fundamental aspect of data visualization. Students will learn techniques for measuring the location and scale of data, understanding the origins and impacts of noise and missing values in datasets. This module also introduces the CRISP-DM Process, a structured approach to data mining, along with Gartner's Analytics Ascendancy Model for advanced data analysis. Additionally, students will explore the distinction between raw data and processed information, a key concept for effective data interpretation and decision-making.
The First Step of Knowing Your Data - Part 1•8 minutes
The First Step of Knowing Your Data - Part 2•5 minutes
The First Step of Knowing Your Data - Part 3•9 minutes
The First Step of Knowing Your Data - Part 4•10 minutes
7 readings•Total 290 minutes
Syllabus•10 minutes
Data Files•60 minutes
Module 1 Introduction•30 minutes
Big Data and IEEE 754•60 minutes
CRISP-DM2•60 minutes
Selecting the Bin Size of a Time Histogram•60 minutes
Module 1 Summary•10 minutes
4 assignments•Total 225 minutes
Why Do We Analyze Data Quiz•15 minutes
The Process of Data Analysis Quiz•15 minutes
Knowing Your Data Quiz•15 minutes
Module 1 Summative Assessment•180 minutes
1 discussion prompt•Total 60 minutes
Meet and Greet Discussion•60 minutes
1 ungraded lab•Total 60 minutes
Module 1 Python Lab - VS Code•60 minutes
Module 2: Measure and Visualize Correlation
Module 2•10 hours to complete
Module details
Module 2 delves into the intricacies of statistical analysis, beginning with a thorough understanding of the p-value concept and its significance as a Type I Error indicator. Students will learn to apply statistical tests in Python to identify significantly correlated features, exploring various correlation metrics tailored for categorical, mixed-type, and continuous features. This module emphasizes practical application, equipping students with the skills to calculate and interpret these metrics using Python, thereby enhancing their ability to conduct sophisticated data analysis and draw meaningful conclusions from complex datasets.
What's included
7 videos5 readings4 assignments1 ungraded lab
Show info about module content
7 videos•Total 54 minutes
Module 2 Introduction•2 minutes
Discover and Measure Associations - Part 1•10 minutes
Discover and Measure Associations - Part 2•10 minutes
Measure Associations - Part 1•8 minutes
Measure Associations - Part 1 (Continued)•7 minutes
Measure Associations - Part 2•9 minutes
Measure Associations - Part 2 (Continued)•9 minutes
5 readings•Total 250 minutes
Module 2 Introduction•60 minutes
Chicago Taxi Trip Data•60 minutes
Correlation with Python•60 minutes
Eta-squared•60 minutes
Module 2 Summary•10 minutes
4 assignments•Total 225 minutes
Correlation of Continuous Features Quiz•15 minutes
Correlation of Mixed Types Features•15 minutes
Means to an End for Feature Screening Quiz•15 minutes
Module 2 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 2 Python Lab - VS Code•60 minutes
Module 3: Market Basket Analysis
Module 3•9 hours to complete
Module details
Module 3 offers a deep dive into the world of Association Rules, teaching students how to improvise these rules for identifying valuable feature combinations that generate specific label values. Learners will master setting appropriate thresholds for Support and Confidence and gain a comprehensive understanding of the Apriori Algorithm and the significance of Frequent Itemsets within it. This module covers the calculation of common metrics for Association Rules, familiarizing students with the relevant terminology. Additionally, learners will explore the practical application of Association Rules in Market Basket Analysis, including strategies for cross-selling, up-selling, and product bundling, equipping them with valuable skills for advanced data-driven decision making in business contexts.
What's included
7 videos5 readings3 assignments1 ungraded lab
Show info about module content
7 videos•Total 46 minutes
Module 3 Introduction•1 minute
What is in Your Basket - Part 1•7 minutes
What is in Your Basket - Part 2•6 minutes
How Are Association Rules Discovered - Part 1•9 minutes
How Are Association Rules Discovered - Part 2•8 minutes
What Can Association Rules Tell Me - Part 1•8 minutes
What Can Association Rules Tell Me - Part 2•6 minutes
5 readings•Total 200 minutes
PGML Chapter 3•60 minutes
Cross-Selling•60 minutes
Apriori Algorithm and Association Rules•60 minutes
Module 3 Summary•10 minutes
Insights from an Industry Leader: Learn More About Our Program•10 minutes
3 assignments•Total 210 minutes
Market Basket Analysis Quiz•15 minutes
Association Rules Discovery Quiz•15 minutes
Module 3 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 3 Python Lab - VS Code•60 minutes
Module 4: Partitioning, Segmenting, and Clustering of Observations
Module 4•10 hours to complete
Module details
In Module 4, students will learn how to describe and interpret profiles of clusters, gaining proficiency in deploying the K-Means and K-Modes clustering algorithms. They will explore the application of Recency, Frequency, and Monetary (RFM) Analysis to identify the most valuable customers in retail business settings. The module also covers the technique of Simple Random Sampling with the option of incorporating stratification variables, enhancing the precision of data analysis. Furthermore, it emphasizes the importance of objectively validating models using a testing partition, ensuring the reliability and effectiveness of the analytical models in real-world scenarios.
What's included
8 videos5 readings4 assignments1 ungraded lab
Show info about module content
8 videos•Total 70 minutes
Module 4 Introduction•1 minute
Partition Observations for Training Models - Part 1•10 minutes
Partition Observations for Training Models - Part 2•12 minutes
Create Segments of Observations for Business Reasons - Part 1•10 minutes
Create Segments of Observations for Business Reasons - Part 2•10 minutes
Put Observations with Similar Feature Values in Clusters - Part 1•10 minutes
Put Observations with Similar Feature Values in Clusters - Part 2•11 minutes
Put Observations with Similar Feature Values in Clusters - Part 3•8 minutes
5 readings•Total 220 minutes
PGML Chapter 4 •30 minutes
Sampling Techniques•60 minutes
RFM•60 minutes
Clustering•60 minutes
Module 4 Summary•10 minutes
4 assignments•Total 225 minutes
Partition Observations for Training Models Quiz•15 minutes
Segments of Observations Quiz•15 minutes
Clustering Quiz•15 minutes
Module 4 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 4 Python Lab - VS Code•60 minutes
Module 5: Linear Regression
Module 5•10 hours to complete
Module details
This module delves into feature importance analysis in machine learning, covering Shapley Values, feature selection methods, statistical evaluation, feature interaction, aliasing, and the Least Squares Algorithm. Students will be able to master these concepts to build robust and interpretable models.
What's included
8 videos5 readings4 assignments1 ungraded lab
Show info about module content
8 videos•Total 53 minutes
Module 5 Introduction•1 minute
Linear Regression Model - Part 1•10 minutes
Linear Regression Model - Part 2•5 minutes
Forward Selection - Part 1•8 minutes
Forward Selection - Part 2•4 minutes
Feature Importance - Part 1•9 minutes
Feature Importance - Part 2•8 minutes
Feature Importance - Part 3•7 minutes
5 readings•Total 250 minutes
Linear Regression Analysis •60 minutes
Least Squares Regression •60 minutes
Forward and Backward Stepwise Regression•60 minutes
Shapley Values•60 minutes
Module 5 Summary•10 minutes
4 assignments•Total 225 minutes
Linear Regression Model Quiz•15 minutes
Feature Selection Quiz•15 minutes
Feature Importance Quiz•15 minutes
Module 5 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 5 Python Lab - VS Code•60 minutes
Module 6: Binary Logistic Regression
Module 6•9 hours to complete
Module details
In Module 6, students will master the art of feature selection in machine learning by exploring the Forward and Backward Selection Method, the All-Possible Subsets Method, and the concept of complete and quasi-complete separation. Students will also discover association rules for identifying separations, interpret model parameters and predicted probabilities, and delve into the concepts of maximum likelihood estimation, odds, and odds ratios.
What's included
6 videos5 readings4 assignments1 ungraded lab
Show info about module content
6 videos•Total 34 minutes
Module 6 Introduction•1 minute
Logistic Regression - Part 1•6 minutes
Logistic Regression - Part 2•7 minutes
Forward Selection•9 minutes
Interpret Model and Assess Performance - Part 1•8 minutes
Interpret Model and Assess Performance - Part 2•4 minutes
5 readings•Total 220 minutes
PGML Chapter 6•30 minutes
Predictive Analytics•60 minutes
Forward Selection•60 minutes
Best R-squared for Logistic Regression•60 minutes
Module 6 Summary•10 minutes
4 assignments•Total 225 minutes
Logistic Regression Quiz•15 minutes
Forward Selection Quiz•15 minutes
Blessing and the Curse of Too Many Predictors Quiz•15 minutes
Module 6 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 6 Python Lab - VS Code•60 minutes
Module 7: Decision Trees - The CART Algorithm
Module 7•9 hours to complete
Module details
Module 7 will equip students wth the ability to harness the power of tree-based models to uncover hidden patterns in your data. Students will be able to describe clusters effectively, intelligently set algorithm parameters, construct business rules from tree results, and utilize variance metrics, entropy values, and Gini indices for optimal tree construction.
What's included
7 videos5 readings4 assignments1 ungraded lab
Show info about module content
7 videos•Total 37 minutes
Module 7 Introduction•1 minute
Motivation of Decision Trees - Part 1•6 minutes
Motivation of Decision Trees - Part 2•5 minutes
The CART Algorithm - Part 1•3 minutes
The CART Algorithm - Part 2•9 minutes
Cluster Profiling - Part 1•4 minutes
Cluster Profiling - Part 2•7 minutes
5 readings•Total 220 minutes
PGML Chapter 5•30 minutes
CART•60 minutes
CART as an Equation•60 minutes
Decision Trees for Clustering•60 minutes
Module 7 Summary•10 minutes
4 assignments•Total 225 minutes
Motivation of Decision Trees Quiz•15 minutes
The CART Algorithm Quiz•15 minutes
Cluster Profiling Quiz•15 minutes
Module 7 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 7 Python Lab - VS Code•60 minutes
Module 8: Evaluating the Performance of Models
Module 8•9 hours to complete
Module details
Module 8 delves into the realm of evaluation metrics for machine learning models. Students will master the concepts of precision and recall curves, lift curves, and receiver operating characteristics (ROC) curves. Additionally, students will obtain the ability to discover methods for calculating probability thresholds using Kolmogorov-Smirnov statistics and F1 scores. They will be able to explore metrics like misclassification rate, area under the curve (AUC), and root mean squared error (RMSE), along with techniques for computing RMSE and detecting severely misfitted observations using model-specific residuals.
What's included
8 videos5 readings4 assignments1 ungraded lab
Show info about module content
8 videos•Total 43 minutes
Module 8 Introduction•1 minute
Prediction Models•8 minutes
Nominal Classification Models•6 minutes
Binary Classification Models - Part 1•4 minutes
Binary Classification Models - Part 2•6 minutes
Binary Classification Models - Part 3•5 minutes
Binary Classification Models - Part 4•6 minutes
Binary Classification Models - Part 5•7 minutes
5 readings•Total 235 minutes
PGML Chapter 7, 8 •45 minutes
Outliers•60 minutes
ROC Curve•60 minutes
Using Life Analysis•60 minutes
Module 8 Summary•10 minutes
4 assignments•Total 225 minutes
Metrics for Prediction Models Quiz•15 minutes
Metrics for Classification Models Quiz•15 minutes
Charts for Classification Models Quiz•15 minutes
Module 8 Summative Assessment•180 minutes
1 ungraded lab•Total 60 minutes
Module 8 Python Lab - VS Code•60 minutes
Summative Course Assessment
Module 9•3 hours to complete
Module details
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course. Be sure to review the course material thoroughly before taking the assessment.
What's included
1 assignment
Show info about module content
1 assignment•Total 180 minutes
Summative Course Assessment•180 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Build toward a degree
This course is part of the following degree program(s) offered by Illinois Tech. If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
View eligible degrees
Build toward a degree
This course is part of the following degree program(s) offered by Illinois Tech. If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
¹Successful application and enrollment are required. Eligibility requirements apply. Each institution determines the number of credits recognized by completing this content that may count towards degree requirements, considering any existing credits you may have. Click on a specific course for more information.
Illinois Tech is a top-tier, nationally ranked, private research university with programs in engineering, computer science, architecture, design, science, business, human sciences, and law. The university offers bachelor of science, master of science, professional master’s, and Ph.D. degrees—as well as certificates for in-demand STEM fields and other areas of innovation. Talented students from around the world choose to study at Illinois Tech because of the access to real-world opportunities, renowned academic programs, high value, and career prospects of graduates.
OK
Why people choose Coursera for their career
Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.