INTRODUCTION TO THE MODULE

We continue to develop temporal-difference (TD) learning, which is a central and novel idea in RL.
The method learns from experience without a model (like Monte Carlo).
It updates estimates based on other learned estimates (like Dynamic Programming).
An important difference from MC is that TD makes useful updates after each time step.

Q-Learning is an off-policy algorithm that was an early breakthrough in RL. It is based on TD learning.
In this module, we will cover the details of TD learning and Q-Learning, and implement and study the ideas in code.

LEARNING OUTCOMES

At the conclusion of this module, you should be able to:

Explain how Q-Learning works and how it learns off policy
Use Q-Learning to compute value functions
Perform sensitivity analysis on a Q-Learning algorithm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intro_and_learning_outcomes.md

intro_and_learning_outcomes.md

INTRODUCTION TO THE MODULE

LEARNING OUTCOMES

Files

intro_and_learning_outcomes.md

Latest commit

History

intro_and_learning_outcomes.md

File metadata and controls

INTRODUCTION TO THE MODULE

LEARNING OUTCOMES