Johns Hopkins University
YARN MapReduce Architecture and Advanced Programming

New year. Big goals. Bigger savings. Unlock a year of unlimited access to learning with Coursera Plus for $199. Save now.

Johns Hopkins University

YARN MapReduce Architecture and Advanced Programming

Karthik Shyamsunder

Instructor: Karthik Shyamsunder

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

17 hours to complete
3 weeks at 5 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

17 hours to complete
3 weeks at 5 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Learn the fundamentals of YARN and MapReduce architectures, including how they work together to process large-scale data efficiently.

  • Understand and implement Mapper and Reducer parallelism in MapReduce jobs to improve data processing efficiency and scalability.

  • Apply optimization techniques such as combiners, partitioners, and compression to enhance the performance and I/O operations of MapReduce jobs.

  • Explore advanced concepts like multithreading, speculative execution, input/output formats, and how to avoid common MapReduce anti-patterns.

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

January 2025

Assessments

12 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Placeholder

Build your subject-matter expertise

This course is part of the Big Data Processing Using Hadoop Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate
Placeholder
Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 5 modules in this course

This course provides a comprehensive introduction to YARN and MapReduce architectures, covering their fundamental components and capabilities. You will explore the MapReduce programming model, focusing on optimization techniques such as combiners, partitioners, and compression. Key concepts like Mapper and Reducer parallelism will be demonstrated, alongside practical steps for writing and configuring MapReduce jobs. The course also delves into advanced topics such as multithreading, speculative execution, and input/output formats. By the end, You will gain a deep understanding of MapReduce and be equipped to apply best practices in real-world scenarios.

What's included

2 readings

In this module, we will cover the architecture YARN architecture and architectural capabilities followed by MapReduce architecture built on YARN

What's included

6 videos4 readings3 assignments

This module provides a comprehensive overview of the MapReduce API, guiding you through the steps to write a MapReduce program. It covers the concepts of Mapper and Reducer parallelism, illustrating their implementation and impact on data processing efficiency.

What's included

6 videos5 readings3 assignments

This module focuses on advanced MapReduce optimization techniques, including the use of combiners to enhance performance, partitioners to manage data distribution across reducers, and compression methods to optimize I/O. It also covers the application of counters to collect and analyze statistics about MapReduce jobs.

What's included

6 videos5 readings3 assignments

This module explores advanced MapReduce concepts including multithreading, the internals of input/output formats, and speculative execution. It also covers running jobs locally and identifies common MapReduce anti-patterns to avoid.

What's included

7 videos5 readings3 assignments

Instructor

Karthik Shyamsunder
Johns Hopkins University
4 Courses29 learners

Offered by

Recommended if you're interested in Data Management

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

New to Data Management? Start here.

Placeholder

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions