README.md · ppo-Pendulum-v1

README.md

2.3 KB · 84 lines · markdown Raw

1	`---`
2	`library_name: stable-baselines3`
3	`tags:`
4	`- Pendulum-v1`
5	`- deep-reinforcement-learning`
6	`- reinforcement-learning`
7	`- stable-baselines3`
8	`model-index:`
9	`- name: PPO`
10	`results:`
11	`- task:`
12	`type: reinforcement-learning`
13	`name: reinforcement-learning`
14	`dataset:`
15	`name: Pendulum-v1`
16	`type: Pendulum-v1`
17	`metrics:`
18	`- type: mean_reward`
19	`value: -189.25 +/- 66.36`
20	`name: mean_reward`
21	`verified: false`
22	`---`
23
24	`# PPO Agent playing Pendulum-v1`
25	`This is a trained model of a PPO agent playing Pendulum-v1`
26	`using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)`
27	`and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).`
28
29	`The RL Zoo is a training framework for Stable Baselines3`
30	`reinforcement learning agents,`
31	`with hyperparameter optimization and pre-trained agents included.`
32
33	`## Usage (with SB3 RL Zoo)`
34
35	`RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>`
36	`SB3: https://github.com/DLR-RM/stable-baselines3<br/>`
37	`SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib`
38
39	`Install the RL Zoo (with SB3 and SB3-Contrib):`
40	```bash
41	`pip install rl_zoo3`
42	```
43
44	```
45	`# Download model and save it into the logs/ folder`
46	`python -m rl_zoo3.load_from_hub --algo ppo --env Pendulum-v1 -orga HumanCompatibleAI -f logs/`
47	`python -m rl_zoo3.enjoy --algo ppo --env Pendulum-v1 -f logs/`
48	```
49
50	If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
51	```
52	`python -m rl_zoo3.load_from_hub --algo ppo --env Pendulum-v1 -orga HumanCompatibleAI -f logs/`
53	`python -m rl_zoo3.enjoy --algo ppo --env Pendulum-v1 -f logs/`
54	```
55
56	`## Training (with the RL Zoo)`
57	```
58	`python -m rl_zoo3.train --algo ppo --env Pendulum-v1 -f logs/`
59	`# Upload the model and generate video (when possible)`
60	`python -m rl_zoo3.push_to_hub --algo ppo --env Pendulum-v1 -f logs/ -orga HumanCompatibleAI`
61	```
62
63	`## Hyperparameters`
64	```python
65	`OrderedDict([('clip_range', 0.2),`
66	`('ent_coef', 0.0),`
67	`('gae_lambda', 0.95),`
68	`('gamma', 0.9),`
69	`('learning_rate', 0.001),`
70	`('n_envs', 4),`
71	`('n_epochs', 10),`
72	`('n_steps', 1024),`
73	`('n_timesteps', 100000.0),`
74	`('policy', 'MlpPolicy'),`
75	`('sde_sample_freq', 4),`
76	`('use_sde', True),`
77	`('normalize', False)])`
78	```
79
80	`# Environment Arguments`
81	```python
82	`{'render_mode': 'rgb_array'}`
83	```
84