README.md · ppo-seals-CartPole-v0

README.md

2.7 KB · 88 lines · markdown Raw

1	`---`
2	`library_name: stable-baselines3`
3	`tags:`
4	`- seals/CartPole-v0`
5	`- deep-reinforcement-learning`
6	`- reinforcement-learning`
7	`- stable-baselines3`
8	`model-index:`
9	`- name: PPO`
10	`results:`
11	`- task:`
12	`type: reinforcement-learning`
13	`name: reinforcement-learning`
14	`dataset:`
15	`name: seals/CartPole-v0`
16	`type: seals/CartPole-v0`
17	`metrics:`
18	`- type: mean_reward`
19	`value: 500.00 +/- 0.00`
20	`name: mean_reward`
21	`verified: false`
22	`---`
23
24	`# PPO Agent playing seals/CartPole-v0`
25	`This is a trained model of a PPO agent playing seals/CartPole-v0`
26	`using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)`
27	`and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).`
28
29	`The RL Zoo is a training framework for Stable Baselines3`
30	`reinforcement learning agents,`
31	`with hyperparameter optimization and pre-trained agents included.`
32
33	`## Usage (with SB3 RL Zoo)`
34
35	`RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>`
36	`SB3: https://github.com/DLR-RM/stable-baselines3<br/>`
37	`SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib`
38
39	`Install the RL Zoo (with SB3 and SB3-Contrib):`
40	```bash
41	`pip install rl_zoo3`
42	```
43
44	```
45	`# Download model and save it into the logs/ folder`
46	`python -m rl_zoo3.load_from_hub --algo ppo --env seals/CartPole-v0 -orga HumanCompatibleAI -f logs/`
47	`python -m rl_zoo3.enjoy --algo ppo --env seals/CartPole-v0 -f logs/`
48	```
49
50	If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
51	```
52	`python -m rl_zoo3.load_from_hub --algo ppo --env seals/CartPole-v0 -orga HumanCompatibleAI -f logs/`
53	`python -m rl_zoo3.enjoy --algo ppo --env seals/CartPole-v0 -f logs/`
54	```
55
56	`## Training (with the RL Zoo)`
57	```
58	`python -m rl_zoo3.train --algo ppo --env seals/CartPole-v0 -f logs/`
59	`# Upload the model and generate video (when possible)`
60	`python -m rl_zoo3.push_to_hub --algo ppo --env seals/CartPole-v0 -f logs/ -orga HumanCompatibleAI`
61	```
62
63	`## Hyperparameters`
64	```python
65	`OrderedDict([('batch_size', 256),`
66	`('clip_range', 0.4),`
67	`('ent_coef', 0.008508727919228772),`
68	`('gae_lambda', 0.9),`
69	`('gamma', 0.9999),`
70	`('learning_rate', 0.0012403278189645594),`
71	`('max_grad_norm', 0.8),`
72	`('n_envs', 8),`
73	`('n_epochs', 10),`
74	`('n_steps', 512),`
75	`('n_timesteps', 100000.0),`
76	`('policy', 'MlpPolicy'),`
77	`('policy_kwargs',`
78	`{'activation_fn': <class 'torch.nn.modules.activation.ReLU'>,`
79	`'net_arch': [{'pi': [64, 64], 'vf': [64, 64]}]}),`
80	`('vf_coef', 0.489343896591493),`
81	`('normalize', False)])`
82	```
83
84	`# Environment Arguments`
85	```python
86	`{'render_mode': 'rgb_array'}`
87	```
88