README.md · sac-BipedalWalkerHardcore-v3

README.md

2.1 KB · 67 lines · markdown Raw

1	`---`
2	`library_name: stable-baselines3`
3	`tags:`
4	`- BipedalWalkerHardcore-v3`
5	`- deep-reinforcement-learning`
6	`- reinforcement-learning`
7	`- stable-baselines3`
8	`model-index:`
9	`- name: SAC`
10	`results:`
11	`- metrics:`
12	`- type: mean_reward`
13	`value: 11.30 +/- 107.41`
14	`name: mean_reward`
15	`task:`
16	`type: reinforcement-learning`
17	`name: reinforcement-learning`
18	`dataset:`
19	`name: BipedalWalkerHardcore-v3`
20	`type: BipedalWalkerHardcore-v3`
21	`---`
22
23	`# SAC Agent playing BipedalWalkerHardcore-v3`
24	`This is a trained model of a SAC agent playing BipedalWalkerHardcore-v3`
25	`using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)`
26	`and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).`
27
28	`The RL Zoo is a training framework for Stable Baselines3`
29	`reinforcement learning agents,`
30	`with hyperparameter optimization and pre-trained agents included.`
31
32	`## Usage (with SB3 RL Zoo)`
33
34	`RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>`
35	`SB3: https://github.com/DLR-RM/stable-baselines3<br/>`
36	`SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib`
37
38	```
39	`# Download model and save it into the logs/ folder`
40	`python -m rl_zoo3.load_from_hub --algo sac --env BipedalWalkerHardcore-v3 -orga sb3 -f logs/`
41	`python enjoy.py --algo sac --env BipedalWalkerHardcore-v3 -f logs/`
42	```
43
44	`## Training (with the RL Zoo)`
45	```
46	`python train.py --algo sac --env BipedalWalkerHardcore-v3 -f logs/`
47	`# Upload the model and generate video (when possible)`
48	`python -m rl_zoo3.push_to_hub --algo sac --env BipedalWalkerHardcore-v3 -f logs/ -orga sb3`
49	```
50
51	`## Hyperparameters`
52	```python
53	`OrderedDict([('batch_size', 256),`
54	`('buffer_size', 1000000),`
55	`('ent_coef', 0.005),`
56	`('gamma', 0.99),`
57	`('gradient_steps', 1),`
58	`('learning_rate', 'lin_7.3e-4'),`
59	`('learning_starts', 10000),`
60	`('n_timesteps', 10000000.0),`
61	`('policy', 'MlpPolicy'),`
62	`('policy_kwargs', 'dict(net_arch=[400, 300])'),`
63	`('tau', 0.01),`
64	`('train_freq', 1),`
65	`('normalize', False)])`
66	```
67