What is a real-time streaming pipeline in this course?

In this course, a real-time streaming pipeline is a connected flow that ingests events as they arrive, processes them continuously, and produces updated outputs without waiting for a scheduled batch run. The emphasis is on designing that flow so it stays low-latency, scalable, and reliable as data keeps moving.

When would you use this kind of real-time pipeline?

You would use this kind of pipeline when the value of the data depends on handling it as it happens rather than much later. The course frames it for ongoing event streams where timely processing, continuous analysis, and immediate outputs matter.

How does a streaming pipeline fit into a broader workflow?

A streaming pipeline sits between event sources and the systems that use processed results, turning raw event flow into structured, ongoing outputs. In the course, it is treated as the repeatable middle layer that connects ingestion, transformation, and operational monitoring.

How is a streaming pipeline different from batch processing?

A streaming pipeline works on events continuously, while batch processing collects data first and runs later on a schedule. The course uses this contrast to show why streaming is better suited to low-latency work, but also why it requires added attention to state, late data, and fault tolerance.

Do you need any prerequisites before learning to build streaming pipelines?

A basic background in Python or Scala, comfort with the command line, and a high-level understanding of distributed systems are helpful before you start. The course also assumes simple introductory familiarity with Kafka and Spark rather than deep experience building streaming systems.

What tools, platforms, or methods are used in this course?

The course centers on Apache Kafka for event ingestion and Apache Spark Structured Streaming for continuous processing. It also introduces event-driven design and reliability practices such as checkpointing and monitoring.

What specific tasks will you practice or complete in this course?

You practice designing Kafka topics and event flow, connecting live streams to Spark, and applying transformations, windowing, and stateful processing to incoming data. You also work on checkpointing, monitoring, and tuning so the pipeline can run reliably in real-time.

Design Real-Time Architectures with Spark & Kafka

This course is part of Real-Time, Real Fast: Kafka & Spark for Data Engineers Specialization

Instructors: Soheil Haddadi

Included with

Learn more

3 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

4 hours to complete

Flexible schedule

Learn at your own pace

3 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

4 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Examine core real-time data principles and how Kafka and Spark support streaming architectures.
Create real-time pipelines by connecting Kafka topics with Spark Structured Streaming.
Improve and deploy streaming systems using monitoring, fault tolerance, and tuning.

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Real-Time, Real Fast: Kafka & Spark for Data Engineers Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 3 modules in this course

“Design Real-Time Architectures with Apache Spark & Kafka” is an intermediate-level course crafted for learners aiming to build modern, scalable streaming systems. Across engaging, scenario-driven lessons, the course offers a comprehensive introduction to designing and implementing real-time data pipelines. Participants explore the foundations of streaming concepts, event-driven patterns, and the unique demands of low-latency processing. They gain practical experience working with Apache Kafka for event ingestion and Apache Spark Structured Streaming for real-time computation, learning to transform raw streams into actionable insights. The curriculum emphasizes reliable pipeline design, covering fault tolerance, checkpointing, and performance tuning to ensure systems can operate at scale. Through hands-on practice, guided dialogues, and real-world financial data scenarios, learners develop the confidence to architect, optimize, and deploy production-ready streaming solutions. By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.

Learners should know basic Python or Scala, be comfortable with the command line, understand distributed systems at a high level, and have a simple introductory familiarity with Kafka and Spark. This course is ideal for aspiring data engineers, analysts or data scientists shifting into real-time systems, and software engineers exploring event-driven architecture. It also suits anyone working with large-scale data or financial and AI/ML pipelines who wants to understand how real-time data powers modern systems. By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.

Module details

This module introduces the core principles behind real-time data systems and how they differ from traditional batch processing. Learners explore key patterns such as event-driven design, streaming workflows, and the roles Kafka and Spark play in a modern data ecosystem. By the end, learners understand the foundational components required to build low-latency, scalable streaming architectures.

What's included

4 videos2 readings1 peer review

4 videosTotal 18 minutes

Welcome to the Real-Time Architectures with Apache Spark & Kafka2 minutes
Key Components: Kafka, Spark & Supporting Ecosystem Tools5 minutes
Event-Driven Patterns and Streaming Design Principles5 minutes
Key Components: Kafka, Spark & Supporting Ecosystem Tools6 minutes

2 readingsTotal 10 minutes

Welcome to the Course: Course Overview5 minutes
Streaming Data vs. Stream Processing vs. Real-Time Analytics5 minutes

1 peer reviewTotal 20 minutes

Hands-On-Learning: Mapping a Real-Time Architecture for Live Transaction Monitoring20 minutes

In this module, learners dive into the practical construction of streaming pipelines using Kafka and Spark Structured Streaming. They design Kafka topics, configure producers and consumers, and connect Spark to process incoming data streams. The module emphasizes transformations, windowing, and stateful operations essential for building functional real-world pipelines.

What's included

3 videos1 reading1 peer review

3 videosTotal 20 minutes

Designing Kafka Topics, Producers & Consumers5 minutes
Connecting Spark Structured Streaming to Kafka7 minutes
Transformations, Windows & Stateful Stream Processing8 minutes

1 readingTotal 5 minutes

Designing Effective Kafka Topics and Event Streams5 minutes

1 peer reviewTotal 20 minutes

Hands-On-Learning: Building a Streaming Pipeline for Real-Time Transaction Alerts20 minutes

This module focuses on preparing real-time systems for production environments. Learners explore fault tolerance, scalability strategies, and performance tuning for Kafka and Spark. They also learn how to monitor streaming workloads, implement checkpoints, and ensure reliability. The module concludes with best practices for deploying and maintaining robust, enterprise-ready real-time architectures.

What's included

4 videos1 reading1 assignment2 peer reviews

4 videosTotal 21 minutes

Ensuring Reliability with Checkpointing & Fault Tolerance5 minutes
Performance Tuning Kafka & Spark for Real-Time Workloads5 minutes
Deploying, Monitoring & Managing Streaming Pipelines8 minutes
Course Wrap-Up2 minutes

1 readingTotal 5 minutes

10× Pipeline Performance: Kafka and Spark Tuning in Practice5 minutes

1 assignmentTotal 20 minutes

Design Real-Time Architectures with Spark & Kafka20 minutes

2 peer reviewsTotal 80 minutes

Hands-On-Learning: Optimizing and Monitoring a Production-Ready Streaming System20 minutes
Project: Real-Time Streaming Alert System for Money-Laundering Detection60 minutes