README.md
2.3 KB · 84 lines · markdown Raw
1 ---
2 library_name: stable-baselines3
3 tags:
4 - Pendulum-v1
5 - deep-reinforcement-learning
6 - reinforcement-learning
7 - stable-baselines3
8 model-index:
9 - name: PPO
10 results:
11 - task:
12 type: reinforcement-learning
13 name: reinforcement-learning
14 dataset:
15 name: Pendulum-v1
16 type: Pendulum-v1
17 metrics:
18 - type: mean_reward
19 value: -189.25 +/- 66.36
20 name: mean_reward
21 verified: false
22 ---
23
24 # **PPO** Agent playing **Pendulum-v1**
25 This is a trained model of a **PPO** agent playing **Pendulum-v1**
26 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27 and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
29 The RL Zoo is a training framework for Stable Baselines3
30 reinforcement learning agents,
31 with hyperparameter optimization and pre-trained agents included.
32
33 ## Usage (with SB3 RL Zoo)
34
35 RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36 SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37 SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
39 Install the RL Zoo (with SB3 and SB3-Contrib):
40 ```bash
41 pip install rl_zoo3
42 ```
43
44 ```
45 # Download model and save it into the logs/ folder
46 python -m rl_zoo3.load_from_hub --algo ppo --env Pendulum-v1 -orga HumanCompatibleAI -f logs/
47 python -m rl_zoo3.enjoy --algo ppo --env Pendulum-v1 -f logs/
48 ```
49
50 If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
51 ```
52 python -m rl_zoo3.load_from_hub --algo ppo --env Pendulum-v1 -orga HumanCompatibleAI -f logs/
53 python -m rl_zoo3.enjoy --algo ppo --env Pendulum-v1 -f logs/
54 ```
55
56 ## Training (with the RL Zoo)
57 ```
58 python -m rl_zoo3.train --algo ppo --env Pendulum-v1 -f logs/
59 # Upload the model and generate video (when possible)
60 python -m rl_zoo3.push_to_hub --algo ppo --env Pendulum-v1 -f logs/ -orga HumanCompatibleAI
61 ```
62
63 ## Hyperparameters
64 ```python
65 OrderedDict([('clip_range', 0.2),
66 ('ent_coef', 0.0),
67 ('gae_lambda', 0.95),
68 ('gamma', 0.9),
69 ('learning_rate', 0.001),
70 ('n_envs', 4),
71 ('n_epochs', 10),
72 ('n_steps', 1024),
73 ('n_timesteps', 100000.0),
74 ('policy', 'MlpPolicy'),
75 ('sde_sample_freq', 4),
76 ('use_sde', True),
77 ('normalize', False)])
78 ```
79
80 # Environment Arguments
81 ```python
82 {'render_mode': 'rgb_array'}
83 ```
84