Fundamentals of Reinforcement Learning

Fundamentals of Reinforcement Learning

This course is part of Reinforcement Learning Specialization

Instructors: Martha White

110,716 already enrolled

Included with Learn more

5 modules

Gain insight into a topic and learn the fundamentals.

2,901 reviews

Intermediate level

Recommended experience

Flexible schedule

2 weeks at 10 hours a week

Learn at your own pace

5 modules

Gain insight into a topic and learn the fundamentals.

2,901 reviews

Intermediate level

Recommended experience

Flexible schedule

2 weeks at 10 hours a week

Learn at your own pace

What you'll learn

Formalize problems as Markov Decision Processes
Understand basic exploration methods and the exploration / exploitation tradeoff
Understand value functions, as a general-purpose tool for optimal decision-making
Know how to implement dynamic programming as an efficient solution approach to an industrial control problem

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments¹

AI Graded see disclaimer

Taught in English

92%

Most learners liked this course

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Reinforcement Learning Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 5 modules in this course

Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.

This course introduces you to the fundamentals of Reinforcement Learning. When you finish this course, you will: - Formalize problems as Markov Decision Processes - Understand basic exploration methods and the exploration/exploitation tradeoff - Understand value functions, as a general-purpose tool for optimal decision-making - Know how to implement dynamic programming as an efficient solution approach to an industrial control problem This course teaches you the key concepts of Reinforcement Learning, underlying classic and modern algorithms in RL. After completing this course, you will be able to start using RL for real problems, where you have or can specify the MDP. This is the first course of the Reinforcement Learning Specialization.

Welcome to: Fundamentals of Reinforcement Learning, the first course in a four-part specialization on Reinforcement Learning brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, get a flavour of what the course has in store for you, and be given an in-depth roadmap to help make your journey through this specialization as smooth as possible.

What's included

4 videos2 readings1 discussion prompt

For the first week of this course, you will learn how to understand the exploration-exploitation trade-off in sequential decision-making, implement incremental algorithms for estimating action-values, and compare the strengths and weaknesses to different algorithms for exploration. For this week’s graded assessment, you will implement and test an epsilon-greedy agent.

What's included

8 videos3 readings1 assignment1 programming assignment1 discussion prompt2 plugins

8 videosTotal 46 minutes

Sequential Decision Making with Evaluative Feedback6 minutes
Learning Action Values5 minutes
Estimating Action Values Incrementally5 minutes
What is the trade-off?8 minutes
Optimistic Initial Values6 minutes
Upper-Confidence Bound (UCB) Action Selection5 minutes
Jonathan Langford: Contextual Bandits for Real World Reinforcement Learning9 minutes
Week 1 Summary3 minutes

3 readingsTotal 70 minutes

Module 1 Learning Objectives10 minutes
Weekly Reading30 minutes
Chapter Summary30 minutes

1 assignmentTotal 45 minutes

Sequential Decision-Making45 minutes

1 programming assignmentTotal 30 minutes

Bandits and Exploration/Exploitation30 minutes

1 discussion promptTotal 10 minutes

Compare bandits to supervised learning10 minutes

2 pluginsTotal 30 minutes

Let's play a game!15 minutes
What's underneath?15 minutes

When you’re presented with a problem in industry, the first and most important step is to translate that problem into a Markov Decision Process (MDP). The quality of your solution depends heavily on how well you do this translation. This week, you will learn the definition of MDPs, you will understand goal-directed behavior and how this can be obtained from maximizing scalar rewards, and you will also understand the difference between episodic and continuing tasks. For this week’s graded assessment, you will create three example tasks of your own that fit into the MDP framework.

What's included

7 videos2 readings1 assignment1 peer review1 discussion prompt

7 videosTotal 36 minutes

Markov Decision Processes7 minutes
Examples of MDPs4 minutes
The Goal of Reinforcement Learning3 minutes
Michael Littman: The Reward Hypothesis12 minutes
Continuing Tasks5 minutes
Examples of Episodic and Continuing Tasks3 minutes
Week 2 Summary2 minutes

2 readingsTotal 40 minutes

Module 2 Learning Objectives10 minutes
Weekly Reading30 minutes

1 assignmentTotal 45 minutes

MDPs45 minutes

1 peer reviewTotal 60 minutes

Graded Assignment: Describe Three MDPs60 minutes

1 discussion promptTotal 10 minutes

Is the reward hypothesis sufficient?10 minutes

Once the problem is formulated as an MDP, finding the optimal policy is more efficient when using value functions. This week, you will learn the definition of policies and value functions, as well as Bellman equations, which is the key technology that all of our algorithms will use.

What's included

9 videos3 readings2 assignments1 discussion prompt

9 videosTotal 56 minutes

Specifying Policies5 minutes
Value Functions6 minutes
Rich Sutton and Andy Barto: A brief History of RL8 minutes
Bellman Equation Derivation6 minutes
Why Bellman Equations?5 minutes
Optimal Policies8 minutes
Optimal Value Functions5 minutes
Using Optimal Value Functions to Get Optimal Policies8 minutes
Week 3 Summary4 minutes

3 readingsTotal 53 minutes

Module 3 Learning Objectives10 minutes
Weekly Reading30 minutes
Chapter Summary13 minutes

2 assignmentsTotal 90 minutes

[Graded] Value Functions and Bellman Equations45 minutes
[Practice] Value Functions and Bellman Equations45 minutes

1 discussion promptTotal 10 minutes

Check-in10 minutes

This week, you will learn how to compute value functions and optimal policies, assuming you have the MDP model. You will implement dynamic programming to compute value functions and optimal policies and understand the utility of dynamic programming for industrial applications and problems. Further, you will learn about Generalized Policy Iteration as a common template for constructing algorithms that maximize reward. For this week’s graded assessment, you will implement an efficient dynamic programming agent in a simulated industrial control problem.

What's included

10 videos3 readings1 assignment1 programming assignment1 discussion prompt

10 videosTotal 72 minutes

Policy Evaluation vs. Control5 minutes
Iterative Policy Evaluation9 minutes
Policy Improvement4 minutes
Policy Iteration8 minutes
Flexibility of the Policy Iteration Framework4 minutes
Efficiency of Dynamic Programming5 minutes
Warren Powell: Approximate Dynamic Programming for Fleet Management (Short)8 minutes
Warren Powell: Approximate Dynamic Programming for Fleet Management (Long)22 minutes
Week 4 Summary3 minutes
Congratulations!4 minutes

3 readingsTotal 70 minutes

Module 4 Learning Objectives10 minutes
Weekly Reading30 minutes
Chapter Summary30 minutes

1 assignmentTotal 45 minutes

Dynamic Programming45 minutes

1 programming assignmentTotal 30 minutes

Optimal Policies with Dynamic Programming30 minutes

1 discussion promptTotal 10 minutes

Where can you use dynamic programming?10 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings

(835 ratings)

Martha White

University of Alberta

4 Courses116,934 learners

Adam White

University of Alberta

4 Courses116,934 learners

Offered by

University of Alberta

Alberta Machine Intelligence Institute

Explore more from Machine Learning

University of Alberta
Sample-based Learning Methods
Course
University of Alberta
Prediction and Control with Function Approximation
Course
University of Alberta
A Complete Reinforcement Learning System (Capstone)
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
81.73%
4 stars
14.31%
3 stars
2.61%
2 stars
0.44%
1 star
0.89%

Showing 3 of 2901

Reviewed on Jul 1, 2021

This course is great for people who are just starting out. The programming assignments are really great and practically introduce you to the basic concepts of reinforcement learning.

Reviewed on Oct 31, 2020

This course gives you a concept of RL and clearly example to understand this concepts. Following the textbook and practice with the exercise and quizes, i learned all the concepts.

Reviewed on May 5, 2021

Fantastic course! I have been interested in Reinforcement Learning for a long time and this has been the best introduction I have found so far. It gave me the foundations on the field.

View more reviews

Unlock access to 10,000+ courses with a subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

To access course materials, assignments, and earn a Certificate, you'll need to purchase the Certificate experience when you enroll in a course. Eligible learners may also have the option to start with a Free Trial. Some courses may also offer a Full Course, No Certificate option. This lets you access course materials, submit required assessments, and receive a final grade, but you won't be able to earn or purchase a Certificate.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.