Mô tả

This is the most complete Reinforcement Learning course series on Udemy. In it, you will learn to implement some of the most powerful Deep Reinforcement Learning algorithms in Python using PyTorch and PyTorch lightning. You will implement from scratch adaptive algorithms that solve control tasks based on experience. You will learn to combine these techniques with Neural Networks and Deep Learning methods to create adaptive Artificial Intelligence agents capable of solving decision-making tasks.

This course will introduce you to the state of the art in Reinforcement Learning techniques. It will also prepare you for the next courses in this series, where we will explore other advanced methods that excel in other types of task.

The course is focused on developing practical skills. Therefore, after learning the most important concepts of each family of methods, we will implement one or more of their algorithms in jupyter notebooks, from scratch.


Leveling modules: 


- Refresher: The Markov decision process (MDP).

- Refresher: Monte Carlo methods.

- Refresher: Temporal difference methods.

- Refresher: N-step bootstrapping.

- Refresher: Brief introduction to Neural Networks.

- Refresher: Policy gradient methods.



Advanced Reinforcement Learning:


- REINFORCE

- REINFORCE for continuous action spaces

- Advantage actor-critic (A2C)

- Trust region methods

- Proximal policy optimization (PPO)

- Generalized advantage estimation (GAE)

- Trust region policy optimization (TRPO)

Bạn sẽ học được gì

Master some of the most advanced Reinforcement Learning algorithms.

Learn how to create AIs that can act in a complex environment to achieve their goals.

Create from scratch advanced Reinforcement Learning agents using Python's most popular tools (PyTorch Lightning, OpenAI gym, Optuna)

Learn how to perform hyperparameter tuning (Choosing the best experimental conditions for our AI to learn)

Fundamentally understand the learning process for each algorithm.

Debug and extend the algorithms presented.

Understand and implement new algorithms from research papers.

Yêu cầu

  • Be comfortable programming in Python
  • Completing our course "Reinforcement Learning beginner to master" or being familiar with the basics of Reinforcement Learning (or watching the leveling sections included in this course).
  • Know basic statistics (mean, variance, normal distribution)

Nội dung khoá học

15 sections

Introduction

6 lectures
Introduction
06:07
Reinforcement Learning series
00:14
Google Colab
01:26
Where to begin
00:57
Complete code
00:03
Connect with me on social media
00:05

Refresher: The Markov Decision Process (MDP)

10 lectures
Elements common to all control tasks
05:44
The Markov decision process (MDP)
05:52
Types of Markov decision process
02:23
Trajectory vs episode
01:13
Reward vs Return
01:39
Discount factor
04:19
Policy
02:16
State values v(s) and action values q(s,a)
01:11
Bellman equations
03:17
Solving a Markov decision process
03:21

Refresher: Monte Carlo methods

3 lectures
Monte Carlo methods
03:09
Solving control tasks with Monte Carlo methods
06:56
On-policy Monte Carlo control
04:33

Refresher: Temporal difference methods

6 lectures
Temporal difference methods
03:16
Solving control tasks with temporal difference methods
03:58
Monte Carlo vs temporal difference methods
01:23
SARSA
03:53
Q-Learning
02:22
Advantages of temporal difference methods
00:56

Refresher: N-step bootstrapping

3 lectures
N-step temporal difference methods
03:30
Where do n-step methods fit?
02:53
Effect of changing n
04:36

Refresher: Brief introduction to Neural Networks

6 lectures
Function approximators
07:35
Artificial Neural Networks
03:32
Artificial Neurons
05:38
How to represent a Neural Network
06:44
Stochastic Gradient Descent
05:40
Neural Network optimization
04:01

Refresher: REINFORCE

8 lectures
Policy gradient methods
04:16
Representing policies using neural networks
04:43
Policy performance
02:16
The policy gradient theorem
03:20
REINFORCE
03:38
Parallel learning
03:06
Entropy regularization
05:39
REINFORCE 2
02:03

PyTorch Lightning

8 lectures
PyTorch Lightning
07:56
Link to the code notebook
00:02
Create the policy
13:37
Create the environment
09:31
Create the dataset
14:02
Create the REINFORCE algorithm - Part 1
06:46
Create the REINFORCE algorithm - Part 2
10:45
Check the resulting agent
05:57

REINFORCE for continuous control tasks

8 lectures
REINFORCE for continuous action spaces
04:55
Link to the code notebook
00:02
Create the policy
09:47
Create the inverted pendulum environment
08:46
Create the dataset
08:59
Creating the algorithm - Part 1
06:24
Creating the algorithm - Part 2
06:45
Check the resulting agent
02:39

Advantage Actor Critic (A2C)

8 lectures
A2C
10:49
Link to the code notebook
00:02
Create the policy and value network
04:19
Create the environment
05:39
Create the dataset
03:24
Implement A2C - Part 1
05:47
Implement A2C - Part 2
10:40
Check the resulting agent
02:39

Trust region methods

6 lectures
Line search vs trust region methods
02:23
Line search methods
06:05
Trust region methods 1
02:49
Kullback-Leibler divergence
04:08
Trust region methods 2
10:06
Trust region methods 3
02:42

Proximal Policy Optimization (PPO)

7 lectures
Proximal Policy Optimization
08:57
Link to the code notebook
00:02
Create the environment
08:04
Create the dataset
08:14
Create the PPO algorithm - Part 1
07:53
Create the PPO algorithm - Part 2
14:51
Check the resulting agent
02:16

Generalized Advantage Estimation (GAE)

7 lectures
Generalized Advantage Estimation
11:02
Link to the code notebook
00:02
Create the Half Cheetah environment
05:04
Create the dataset
11:52
PPO with generalized advantage estimation - Part 1
03:23
PPO with generalized advantage estimation - Part 2
07:09
Checking the resulting agent
01:31

Trust Region Policy Optimization (TRPO)

9 lectures
Trust region policy optimization 1
03:28
Trust region policy optimization 2
05:19
Link to the code notebook
00:02
TRPO in code - Part 1
03:08
TRPO in code - Part 2
02:08
TRPO in code - Part 3
01:47
TRPO in code - Part 4
04:26
TRPO in code - Part 5
09:34
TRPO in code - Part 6
01:34

Final steps

1 lectures
Final steps
00:06

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.