When you enroll in this course, you'll also be enrolled in this Specialization.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
There are 3 modules in this course
“Design Real-Time Architectures with Apache Spark & Kafka” is an intermediate-level course crafted for learners aiming to build modern, scalable streaming systems. Across engaging, scenario-driven lessons, the course offers a comprehensive introduction to designing and implementing real-time data pipelines. Participants explore the foundations of streaming concepts, event-driven patterns, and the unique demands of low-latency processing. They gain practical experience working with Apache Kafka for event ingestion and Apache Spark Structured Streaming for real-time computation, learning to transform raw streams into actionable insights. The curriculum emphasizes reliable pipeline design, covering fault tolerance, checkpointing, and performance tuning to ensure systems can operate at scale. Through hands-on practice, guided dialogues, and real-world financial data scenarios, learners develop the confidence to architect, optimize, and deploy production-ready streaming solutions. By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.
Learners should know basic Python or Scala, be comfortable with the command line, understand distributed systems at a high level, and have a simple introductory familiarity with Kafka and Spark.
This course is ideal for aspiring data engineers, analysts or data scientists shifting into real-time systems, and software engineers exploring event-driven architecture. It also suits anyone working with large-scale data or financial and AI/ML pipelines who wants to understand how real-time data powers modern systems.
By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.
This module introduces the core principles behind real-time data systems and how they differ from traditional batch processing. Learners explore key patterns such as event-driven design, streaming workflows, and the roles Kafka and Spark play in a modern data ecosystem. By the end, learners understand the foundational components required to build low-latency, scalable streaming architectures.
What's included
4 videos2 readings1 peer review
Show info about module content
4 videos•Total 18 minutes
Welcome to the Real-Time Architectures with Apache Spark & Kafka•2 minutes
Streaming Data vs. Stream Processing vs. Real-Time Analytics•5 minutes
1 peer review•Total 20 minutes
Hands-On-Learning: Mapping a Real-Time Architecture for Live Transaction Monitoring•20 minutes
Building Real-Time Pipelines with Kafka & Spark
Module 2•1 hour to complete
Module details
In this module, learners dive into the practical construction of streaming pipelines using Kafka and Spark Structured Streaming. They design Kafka topics, configure producers and consumers, and connect Spark to process incoming data streams. The module emphasizes transformations, windowing, and stateful operations essential for building functional real-world pipelines.
This module focuses on preparing real-time systems for production environments. Learners explore fault tolerance, scalability strategies, and performance tuning for Kafka and Spark. They also learn how to monitor streaming workloads, implement checkpoints, and ensure reliability. The module concludes with best practices for deploying and maintaining robust, enterprise-ready real-time architectures.
What's included
4 videos1 reading1 assignment2 peer reviews
Show info about module content
4 videos•Total 21 minutes
Ensuring Reliability with Checkpointing & Fault Tolerance•5 minutes
Performance Tuning Kafka & Spark for Real-Time Workloads•5 minutes
Coursera brings together a diverse network of subject matter experts who have demonstrated their expertise through professional industry experience or strong academic backgrounds. These instructors design and teach courses that make practical, career-relevant skills accessible to learners worldwide.
What is a real-time streaming pipeline in this course?
In this course, a real-time streaming pipeline is a connected flow that ingests events as they arrive, processes them continuously, and produces updated outputs without waiting for a scheduled batch run. The emphasis is on designing that flow so it stays low-latency, scalable, and reliable as data keeps moving.
When would you use this kind of real-time pipeline?
You would use this kind of pipeline when the value of the data depends on handling it as it happens rather than much later. The course frames it for ongoing event streams where timely processing, continuous analysis, and immediate outputs matter.
How does a streaming pipeline fit into a broader workflow?
A streaming pipeline sits between event sources and the systems that use processed results, turning raw event flow into structured, ongoing outputs. In the course, it is treated as the repeatable middle layer that connects ingestion, transformation, and operational monitoring.
How is a streaming pipeline different from batch processing?
A streaming pipeline works on events continuously, while batch processing collects data first and runs later on a schedule. The course uses this contrast to show why streaming is better suited to low-latency work, but also why it requires added attention to state, late data, and fault tolerance.
Do you need any prerequisites before learning to build streaming pipelines?
A basic background in Python or Scala, comfort with the command line, and a high-level understanding of distributed systems are helpful before you start. The course also assumes simple introductory familiarity with Kafka and Spark rather than deep experience building streaming systems.
What tools, platforms, or methods are used in this course?
The course centers on Apache Kafka for event ingestion and Apache Spark Structured Streaming for continuous processing. It also introduces event-driven design and reliability practices such as checkpointing and monitoring.
What specific tasks will you practice or complete in this course?
You practice designing Kafka topics and event flow, connecting live streams to Spark, and applying transformations, windowing, and stateful processing to incoming data. You also work on checkpointing, monitoring, and tuning so the pipeline can run reliably in real-time.