When you enroll in this course, you'll also be asked to select a specific program.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
There are 5 modules in this course
Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application.
In this course, you will learn about the different tools and techniques that are used with ETL and Data pipelines. Both ETL and ELT extract data from source systems, move the data through the data pipeline, and store the data in destination systems. During this course, you will experience how ELT and ETL processing differ and identify use cases for both. You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for loading data into data repositories.
You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure.
By the end of this course, you will also know how to use Apache Airflow to build data pipelines as well be knowledgeable about the advantages of using this approach. You will also learn how to use Apache Kafka to build streaming pipelines as well as the core components of Kafka which include: brokers, topics, partitions, replications, producers, and consumers.
Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.
ETL or Extract, Transform, and Load processes are used for cases where flexibility, speed, and scalability of data are important. You will explore some key differences between similar processes, ETL and ELT, which include the place of transformation, flexibility, Big Data support, and time-to-insight. You will learn that there is an increasing demand for access to raw data that drives the evolution from ETL to ELT. Data extraction involves advanced technologies including database querying, web scraping, and APIs. You will also learn that data transformation is about formatting data to suit the application and that data is loaded in batches or streamed continuously.
What's included
7 videos3 readings2 assignments1 plugin
Show info about module content
7 videos•Total 32 minutes
Course Intro video•5 minutes
ETL Fundamentals•5 minutes
ELT Basics•4 minutes
Comparing ETL and ELT•4 minutes
Data Extraction Techniques•4 minutes
Introduction to Data Transformation Techniques•4 minutes
Data Loading Techniques•4 minutes
3 readings•Total 9 minutes
IBM Product Spotlight: IBM Instana•2 minutes
Course Introduction•4 minutes
Summary & Highlights•3 minutes
2 assignments•Total 40 minutes
Graded Quiz: ETL and ELT Processes•30 minutes
ETL and ELT Processes•10 minutes
1 plugin•Total 5 minutes
Interactivity: Tell the Difference between ETL and ELT•5 minutes
ETL & Data Pipelines: Tools and Techniques
Module 2•3 hours to complete
Module details
Extract, transform and load (ETL) pipelines are created with Bash scripts that can be run on a schedule using cron. Data pipelines move data from one place, or form, to another. Data pipeline processes include scheduling or triggering, monitoring, maintenance, and optimization. Furthermore, Batch pipelines extract and operate on batches of data. Whereas streaming data pipelines ingest data packets one-by-one in rapid succession. In this module, you will learn that streaming pipelines apply when the most current data is needed. You will explore that parallelization and I/O buffers help mitigate bottlenecks. You will also learn how to describe data pipeline performance in terms of latency and throughput.
What's included
5 videos4 readings4 assignments1 app item1 plugin
Show info about module content
5 videos•Total 25 minutes
ETL Using Shell Scripting•5 minutes
Introduction to Data Pipelines•4 minutes
Key Data Pipeline Processes•5 minutes
Batch versus Streaming Data Pipeline Use Cases•5 minutes
Data Pipeline Tools and Technologies•7 minutes
4 readings•Total 15 minutes
Linux Commands and Shell Scripting•2 minutes
ETL Techniques•10 minutes
Summary & Highlights•1 minute
Summary & Highlights•2 minutes
4 assignments•Total 80 minutes
Graded Quiz: ETL using Shell Scripts•30 minutes
Graded Quiz: An Introduction to Data Pipelines•30 minutes
Practice Quiz: ETL using Shell Scripts•10 minutes
Practice Quiz: An Introduction to Data Pipelines•10 minutes
1 app item•Total 30 minutes
Hands-On Lab: ETL using Shell Scripts•30 minutes
1 plugin•Total 10 minutes
Interactivity: Differentiate between Batch Processing and Stream Processing•10 minutes
Building Data Pipelines using Airflow
Module 3•3 hours to complete
Module details
The key advantage of Apache Airflow's approach to representing data pipelines as DAGs is that they are expressed as code, which makes your data pipelines more maintainable, testable, and collaborative. Tasks, the nodes in a DAG, are created by implementing Airflow's built-in operators.
In this module, you will learn about Apache Airflow having a rich UI that simplifies working with data pipelines. You will explore how to visualize your DAG in graph or tree mode. You will also learn about the key components of a DAG definition file, and you will learn that Airflow logs are saved into local file systems and then sent to cloud storage, search engines, and log analyzers.
What's included
5 videos1 reading2 assignments4 app items1 plugin
Show info about module content
5 videos•Total 25 minutes
Apache Airflow Overview•6 minutes
Advantages of Representing Data Pipelines as DAGs in Apache Airflow•7 minutes
Apache Airflow UI•4 minutes
Build a DAG Using Airflow•4 minutes
Airflow Logging and Monitoring•4 minutes
1 reading•Total 3 minutes
Summary & Highlights•3 minutes
2 assignments•Total 40 minutes
Graded Quiz: Building Data Pipelines using Airflow•30 minutes
Practice Quiz: Building Data Pipelines using Airflow•10 minutes
4 app items•Total 120 minutes
Hands-on Lab: Getting Started with Apache Airflow•20 minutes
Hands-on Lab: Create a DAG for Apache Airflow with PythonOperator•40 minutes
Hands-on Lab: Create a DAG for Apache Airflow with BashOperator•40 minutes
Hands-on Lab: Monitoring a DAG•20 minutes
1 plugin•Total 15 minutes
Reading: DAG Structure and Operators•15 minutes
Building Streaming Pipelines using Kafka
Module 4•3 hours to complete
Module details
Apache Kafka is a very popular open source event streaming pipeline. An event is a type of data that describes the entity’s observable state updates over time. Popular Kafka service providers include Confluent Cloud, IBM Event Stream, and Amazon MSK. Additionally, Kafka Streams API is a client library supporting you with data processing in event streaming pipelines.
In this module, you will learn that the core components of Kafka are brokers, topics, partitions, replications, producers, and consumers. You will explore two special types of processors in the Kafka Stream API stream-processing topology: The source processor and the sink processor. You will also learn about building event streaming pipelines using Kafka.
In this final assignment module, you will apply your newly gained knowledge to explore very exciting hands-on labs. “Creating ETL Data Pipelines using Apache Airflow”. You will explore building these ETL pipelines using real-world scenarios.
At IBM, we know how rapidly tech evolves and recognize the crucial need for businesses and professionals to build job-ready, hands-on skills quickly. As a market-leading tech innovator, we’re committed to helping you thrive in this dynamic landscape. Through IBM Skills Network, our expertly designed training programs in AI, software development, cybersecurity, data science, business management, and more, provide the essential skills you need to secure your first job, advance your career, or drive business success. Whether you’re upskilling yourself or your team, our courses, Specializations, and Professional Certificates build the technical expertise that ensures you, and your organization, excel in a competitive world.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.