README.md
2.7 KB · 88 lines · markdown Raw
1 ---
2 library_name: stable-baselines3
3 tags:
4 - seals/CartPole-v0
5 - deep-reinforcement-learning
6 - reinforcement-learning
7 - stable-baselines3
8 model-index:
9 - name: PPO
10 results:
11 - task:
12 type: reinforcement-learning
13 name: reinforcement-learning
14 dataset:
15 name: seals/CartPole-v0
16 type: seals/CartPole-v0
17 metrics:
18 - type: mean_reward
19 value: 500.00 +/- 0.00
20 name: mean_reward
21 verified: false
22 ---
23
24 # **PPO** Agent playing **seals/CartPole-v0**
25 This is a trained model of a **PPO** agent playing **seals/CartPole-v0**
26 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27 and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
29 The RL Zoo is a training framework for Stable Baselines3
30 reinforcement learning agents,
31 with hyperparameter optimization and pre-trained agents included.
32
33 ## Usage (with SB3 RL Zoo)
34
35 RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36 SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37 SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
39 Install the RL Zoo (with SB3 and SB3-Contrib):
40 ```bash
41 pip install rl_zoo3
42 ```
43
44 ```
45 # Download model and save it into the logs/ folder
46 python -m rl_zoo3.load_from_hub --algo ppo --env seals/CartPole-v0 -orga HumanCompatibleAI -f logs/
47 python -m rl_zoo3.enjoy --algo ppo --env seals/CartPole-v0 -f logs/
48 ```
49
50 If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
51 ```
52 python -m rl_zoo3.load_from_hub --algo ppo --env seals/CartPole-v0 -orga HumanCompatibleAI -f logs/
53 python -m rl_zoo3.enjoy --algo ppo --env seals/CartPole-v0 -f logs/
54 ```
55
56 ## Training (with the RL Zoo)
57 ```
58 python -m rl_zoo3.train --algo ppo --env seals/CartPole-v0 -f logs/
59 # Upload the model and generate video (when possible)
60 python -m rl_zoo3.push_to_hub --algo ppo --env seals/CartPole-v0 -f logs/ -orga HumanCompatibleAI
61 ```
62
63 ## Hyperparameters
64 ```python
65 OrderedDict([('batch_size', 256),
66 ('clip_range', 0.4),
67 ('ent_coef', 0.008508727919228772),
68 ('gae_lambda', 0.9),
69 ('gamma', 0.9999),
70 ('learning_rate', 0.0012403278189645594),
71 ('max_grad_norm', 0.8),
72 ('n_envs', 8),
73 ('n_epochs', 10),
74 ('n_steps', 512),
75 ('n_timesteps', 100000.0),
76 ('policy', 'MlpPolicy'),
77 ('policy_kwargs',
78 {'activation_fn': <class 'torch.nn.modules.activation.ReLU'>,
79 'net_arch': [{'pi': [64, 64], 'vf': [64, 64]}]}),
80 ('vf_coef', 0.489343896591493),
81 ('normalize', False)])
82 ```
83
84 # Environment Arguments
85 ```python
86 {'render_mode': 'rgb_array'}
87 ```
88