Post

Stable Baseline 3

Getting Started

Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. — Stable Baseline3 Docs.

Resources

  1. [Stable Baseline3 Docs]
  2. [Stable Baseline3 Repo]

Installation

1
pip install stable-baselines3

An Official Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# https://github.com/DLR-RM/stable-baselines3?tab=readme-ov-file#example

import gymnasium as gym
from stable_baselines3 import A2C

env = gym.make("CartPole-v1", render_mode="rgb_array")

model = A2C("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

vec_env = model.get_env()
obs = vec_env.reset()
for i in range(1000):
    action, _state = model.predict(obs, deterministic=True)
    obs, reward, done, info = vec_env.step(action)
    vec_env.render("human")
    # VecEnv resets automatically
    # if done:
    #   obs = vec_env.reset()

env.close()

Note that, model = A2C("MlpPolicy", env, verbose=1) means the model takes the environment as input, and it works well with the gym environments.

Structure

Knowing how to call the developed algorithms is just the first step. I also want to design my own algorithms based on its framework.

My stable_baseline3 package path: /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3

Inheritance

UML of All Classes

1
pyreverse -o pdf -p stable_baseline3_diagram /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3

UML of A2C

ABC (Abstract Base Class) -> BaseAlgorithm -> OnPolicyAlgorithm -> A2C

For Abstract Base Class, you can check my blog for details.

Paths: /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/

  • A2C: stable_baselines3/a2c
    • stable_baselines3/a2c/a2c.py
    • stable_baselines3/a2c/policies.py
  • common: stable_baselines3/common
    • stable_baselines3/common/base_class.py
    • stable_baselines3/common/on_policy_algorithm.py
    • stable_baselines3/common/policies.py
    • utils <!– - stable_baselines3/common/torch_layers.py
      • stable_baselines3/common/buffers.py
      • stable_baselines3/common/type_aliases.py
      • stable_baselines3/common/utils.py
      • stable_baselines3/common/distributions.py
      • stable_baselines3/common/logger.py
      • stable_baselines3/common/noise.py –>
1
pyreverse -o pdf -p stable_baseline3_A2C_diagram /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3/a2c /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3/common

The following part has not been finished yet.

It seems that this module is designed for single-agent tasks.

This post is licensed under CC BY 4.0 by the author.