Stable Baseline 3
Getting Started
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. — Stable Baseline3 Docs.
Resources
Installation
1
pip install stable-baselines3
An Official Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# https://github.com/DLR-RM/stable-baselines3?tab=readme-ov-file#example
import gymnasium as gym
from stable_baselines3 import A2C
env = gym.make("CartPole-v1", render_mode="rgb_array")
model = A2C("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)
vec_env = model.get_env()
obs = vec_env.reset()
for i in range(1000):
action, _state = model.predict(obs, deterministic=True)
obs, reward, done, info = vec_env.step(action)
vec_env.render("human")
# VecEnv resets automatically
# if done:
# obs = vec_env.reset()
env.close()
Note that, model = A2C("MlpPolicy", env, verbose=1)
means the model takes the environment as input, and it works well with the gym environments.
Structure
Knowing how to call the developed algorithms is just the first step. I also want to design my own algorithms based on its framework.
My stable_baseline3 package path: /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3
Inheritance
UML of All Classes
1
pyreverse -o pdf -p stable_baseline3_diagram /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3
UML of A2C
ABC (Abstract Base Class) -> BaseAlgorithm -> OnPolicyAlgorithm -> A2C
For Abstract Base Class, you can check my blog for details.
Paths: /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/
- A2C:
stable_baselines3/a2c
stable_baselines3/a2c/a2c.py
stable_baselines3/a2c/policies.py
- common:
stable_baselines3/common
stable_baselines3/common/base_class.py
stable_baselines3/common/on_policy_algorithm.py
stable_baselines3/common/policies.py
- utils <!– -
stable_baselines3/common/torch_layers.py
stable_baselines3/common/buffers.py
stable_baselines3/common/type_aliases.py
stable_baselines3/common/utils.py
stable_baselines3/common/distributions.py
stable_baselines3/common/logger.py
stable_baselines3/common/noise.py
–>
1
pyreverse -o pdf -p stable_baseline3_A2C_diagram /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3/a2c /opt/anaconda3/envs/rlbasic/lib/python3.8/site-packages/stable_baselines3/common
The following part has not been finished yet.
It seems that this module is designed for single-agent tasks.