NoSQL, Big Data, and Spark Foundations Specialization

NoSQL, Big Data, and Spark Foundations Specialization

Springboard your Big Data career. Master fundamentals of NoSQL, Big Data, and Apache Spark with hands-on job-ready skills in machine learning and data engineering.

Instructors: IBM Skills Network Team

12,578 already enrolled

Included with Coursera Plus

Learn more

3 course series

Get in-depth knowledge of a subject

4.4

(199 reviews)

Beginner level

Recommended experience

4 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

3 course series

Get in-depth knowledge of a subject

4.4

(199 reviews)

Beginner level

Recommended experience

4 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Work with NoSQL databases to insert, update, delete, query, index, aggregate, and shard/partition data.
Develop hands-on NoSQL experience working with MongoDB, Apache Cassandra, and IBM Cloudant.
Develop foundational knowledge of Big Data and gain hands-on lab experience using Apache Hadoop, MapReduce, Apache Spark, Spark SQL, and Kubernetes.
Perform Extract, Transform and Load (ETL) processing and Machine Learning model training and deployment with Apache Spark.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your subject-matter expertise

Learn in-demand skills from university and industry experts
Master a subject or tool with hands-on projects
Develop a deep understanding of key concepts
Earn a career certificate from IBM

Specialization - 3 course series

Big Data Engineers and professionals with NoSQL skills are highly sought after in the data management industry. This Specialization is designed for those seeking to develop fundamental skills for working with Big Data, Apache Spark, and NoSQL databases. Three information-packed courses cover popular NoSQL databases like MongoDB and Apache Cassandra, the widely used Apache Hadoop ecosystem of Big Data tools, as well as Apache Spark analytics engine for large-scale data processing.

You start with an overview of various categories of NoSQL (Not only SQL) data repositories, and then work hands-on with several of them including IBM Cloudant, MonogoDB and Cassandra. You’ll perform various data management tasks, such as creating & replicating databases, inserting, updating, deleting, querying, indexing, aggregating & sharding data. Next, you’ll gain fundamental knowledge of Big Data technologies such as Hadoop, MapReduce, HDFS, Hive, and HBase, followed by a more in depth working knowledge of Apache Spark, Spark Dataframes, Spark SQL, PySpark, the Spark Application UI, and scaling Spark with Kubernetes. In the final course, you will learn to work with Spark Structured Streaming Spark ML - for performing Extract, Transform and Load processing (ETL) and machine learning tasks.

This specialization is suitable for beginners in the fields of NoSQL and Big Data – whether you are or preparing to be a Data Engineer, Software Developer, IT Architect, Data Scientist, or IT Manager.

Applied Learning Project

The emphasis in this specialization is on learning by doing. As such, each course includes hands-on labs to practice & apply the NoSQL and Big Data skills you learn during lectures.

In the first course, you will work hands-on with several NoSQL databases- MongoDB, Apache Cassandra, and IBM Cloudant to perform a variety of tasks: creating the database, adding documents, querying data, utilizing the HTTP API, performing Create, Read, Update & Delete (CRUD) operations, limiting & sorting records, indexing, aggregation, replication, using CQL shell, keyspace operations, & other table operations.

In the next course, you’ll launch a Hadoop cluster using Docker and run Map Reduce jobs. You’ll explore working with Spark using Jupyter notebooks on a Python kernel. You’ll build your Spark skills using DataFrames, Spark SQL, and scale your jobs using Kubernetes.

In the final course you will use Spark for ETL processing, and Machine Learning model training and deployment using IBM Watson.

Introduction to NoSQL Databases

Course 118 hours

What you'll learn

Differentiate among the four main categories of NoSQL repositories.
Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools.
Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations.
Execute keyspace, table, and CRUD operations in Cassandra.

Skills you'll gain

Category: NoSQL

Category: MongoDB

Category: Apache Cassandra

Category: Data Modeling

Category: Scalability

Category: Query Languages

Category: Distributed Computing

Category: JSON

Category: Cloud Applications

Category: IBM Cloud

Category: Databases

Category: Database Architecture and Administration

Introduction to Big Data with Spark and Hadoop

Course 219 hours

What you'll learn

Explain the impact of big data, including use cases, tools, and processing methods.
Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce.
Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL.
Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.

Skills you'll gain

Category: Apache Spark

Category: Big Data

Category: Distributed Computing

Category: Apache Hadoop

Category: Scalability

Category: Apache Hive

Category: Data Processing

Category: Debugging

Category: PySpark

Category: Development Environment

Category: Performance Tuning

Machine Learning with Apache Spark

Course 315 hours

What you'll learn

Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence.
Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines.
Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML.
Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

Skills you'll gain

Category: Apache Spark

Category: Machine Learning

Category: Extract, Transform, Load

Category: Data Pipelines

Category: PySpark

Category: Regression Analysis

Category: Predictive Modeling

Category: Supervised Learning

Category: Unsupervised Learning

Category: Applied Machine Learning

Category: MLOps (Machine Learning Operations)

Category: Data Processing

Category: Generative AI

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

IBM Skills Network Team

IBM

82 Courses1,422,522 learners

Offered by

IBM

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

The specialization requires 36-42 hours of effort to complete. Working 6-8 hours a week, it can be completed within 1-2 months. Working 3-4 hours a week, it can be completed in 4-6 months.

Basic computer literacy, a grounding in IT systems, working experience with one or more Operating Systems, and programming languages such as Python, data literacy skills, some knowledge of SQL, and a willingness to self-learn online. No prior knowledge of Big Data or NoSQL is required.

It is recommended that you complete the courses in the order in which they occur in the Specialization. Course 2 is a pre-requisite for Course 3.