README.md
2.1 KB · 67 lines · markdown Raw
1 ---
2 library_name: stable-baselines3
3 tags:
4 - BipedalWalkerHardcore-v3
5 - deep-reinforcement-learning
6 - reinforcement-learning
7 - stable-baselines3
8 model-index:
9 - name: SAC
10 results:
11 - metrics:
12 - type: mean_reward
13 value: 11.30 +/- 107.41
14 name: mean_reward
15 task:
16 type: reinforcement-learning
17 name: reinforcement-learning
18 dataset:
19 name: BipedalWalkerHardcore-v3
20 type: BipedalWalkerHardcore-v3
21 ---
22
23 # **SAC** Agent playing **BipedalWalkerHardcore-v3**
24 This is a trained model of a **SAC** agent playing **BipedalWalkerHardcore-v3**
25 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
26 and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
27
28 The RL Zoo is a training framework for Stable Baselines3
29 reinforcement learning agents,
30 with hyperparameter optimization and pre-trained agents included.
31
32 ## Usage (with SB3 RL Zoo)
33
34 RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
35 SB3: https://github.com/DLR-RM/stable-baselines3<br/>
36 SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
37
38 ```
39 # Download model and save it into the logs/ folder
40 python -m rl_zoo3.load_from_hub --algo sac --env BipedalWalkerHardcore-v3 -orga sb3 -f logs/
41 python enjoy.py --algo sac --env BipedalWalkerHardcore-v3 -f logs/
42 ```
43
44 ## Training (with the RL Zoo)
45 ```
46 python train.py --algo sac --env BipedalWalkerHardcore-v3 -f logs/
47 # Upload the model and generate video (when possible)
48 python -m rl_zoo3.push_to_hub --algo sac --env BipedalWalkerHardcore-v3 -f logs/ -orga sb3
49 ```
50
51 ## Hyperparameters
52 ```python
53 OrderedDict([('batch_size', 256),
54 ('buffer_size', 1000000),
55 ('ent_coef', 0.005),
56 ('gamma', 0.99),
57 ('gradient_steps', 1),
58 ('learning_rate', 'lin_7.3e-4'),
59 ('learning_starts', 10000),
60 ('n_timesteps', 10000000.0),
61 ('policy', 'MlpPolicy'),
62 ('policy_kwargs', 'dict(net_arch=[400, 300])'),
63 ('tau', 0.01),
64 ('train_freq', 1),
65 ('normalize', False)])
66 ```
67