Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark.md agents hyperparameters #38

Closed
Scitator opened this issue Sep 9, 2019 · 7 comments
Closed

benchmark.md agents hyperparameters #38

Scitator opened this issue Sep 9, 2019 · 7 comments
Labels
question Further information is requested

Comments

@Scitator
Copy link

Scitator commented Sep 9, 2019

Hi,

Thanks for amazing lib, open-source RL benchmark is really valuable nowadays.
Nevertheless, I am wondering, where I can find hyperparameters used for benchmarked agents? Like network architecture, optimizator parameters and other important RL stuff ;)

@araffin
Copy link
Owner

araffin commented Sep 9, 2019

Hello,

For each trained agent, you have a config.yml file that contains the hyperparameters (if not specified, then the default ones of stable-baselines were used)

Ex for TD3 on HalfCheetathBulletEnv-v0

Note: this was not present in the early versions of the rl zoo, where you need to look at the yaml files in that case.

Ex for A2C on atari games

Please note that this is not a proper benchmark, in the sense that the reported values correspond to only one seed. This more made to check algorithm (maximal) performance, find potential bugs and also people to have pretrained agents available.

@araffin araffin added the question Further information is requested label Sep 9, 2019
@Scitator
Copy link
Author

Scitator commented Sep 9, 2019

Okay, I see.
Then could you please share the hyperparameters for:
dqn–MsPacmanNoFrameskip-v4
dqn–EnduroNoFrameskip-v4
ddpg-BipedalWalker-v2
sac-BipedalWalker-v2
sac-BipedalWalkerHardcore-v2
?
Currently, I am benchmarking different architectures and would like to reproduce some open-source results (article benchmarks are good, but I trust more open-source solutions).

@araffin
Copy link
Owner

araffin commented Sep 10, 2019

Then could you please share the hyperparameters for:

dqn–MsPacmanNoFrameskip-v4
dqn–EnduroNoFrameskip-v4

Those are present in hyperparams/dqn.yml (the atari key)

ddpg-BipedalWalker-v2
sac-BipedalWalker-v2
sac-BipedalWalkerHardcore-v2

There are config files for each one of those in the corresponding folder.

Note: SACCustomPolicy corresponds to the policy described in the original paper ([256, 256] with ReLU)

@Scitator
Copy link
Author

Scitator commented Sep 10, 2019

So, could you please confirm, that I got all hyperparameters right?

atari-dqn (MsPacmanNoFrameskip and EnduroNoFrameskip)

- nature cnn extractor ([32, 64, 64], relu)
- Adam optimizer with 1e-4 learning rate
- initial buffer size - 10k observations
- buffer size - 10k observations
- batch size - 32 observations
- hard target net update each 1k batches
- exploration: e-greedy from 1.0 to 0.01 for 10% of total number of steps in the environment

ddpg (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

- mlp with [64, 64] hiddens and relu
- Adam optimizer with 1e-4 learning rate for actor and 1e-4 for critic
- initial buffer size - ?
- buffer size - 10k observations
- batch size - 256 observations
- soft target update each batch with tau=0.001
- exploration: adaptive parameter noise with target std=0.287

sac (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

- mlp with [256, 256] hiddens and relu
- Adam optimizer with 3e-4 learning rate for both actor and critic
- initial buffer size - 1000 observations
- buffer size - 10k observations
- batch size - 64 observations
- soft target update each batch with tau=0.005
- exploration: ?

overall

- for benchmarking purposes 150k steps were taken in the environment 
(with used frame skip = 4, it means 600k steps in environment)
- all benchmarks were done with n-step=1 q-learning
- and with single thread run

Thanks!

@araffin
Copy link
Owner

araffin commented Sep 10, 2019

for benchmarking purposes 150k steps were taken in the environment

The benchmark is done only at the end of training. The number of training timesteps is also in the config file, for atari, it is the standard 10M steps (so 40M steps in the real env because of the frame skip), for the others, check the config files.

  • all benchmarks were done with n-step=1 q-learning

yes

  • and with single thread run

yes

atari-dqn (MsPacmanNoFrameskip and EnduroNoFrameskip)

Looks good, note that this is a Prioritized Double dueling dqn.

ddpg (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

does not look the ones found in https://github.com/araffin/rl-baselines-zoo/blob/master/trained_agents/ddpg/BipedalWalker-v2/config.yml

Yes, you will need several seeds to have a good one with DDPG. Also, it did not manage to make it work with the HardCore version yet.

sac (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

for SAC, the learning rate for linearly annealed (it helps to avoid catastrophic drop in performance)

  • buffer size - 10k observations

it is 10e6 in the config file...
https://github.com/araffin/rl-baselines-zoo/blob/master/trained_agents/sac/BipedalWalker-v2/config.yml

  • exploration: ?

It is done by SAC automatically using the stochastic policy.

@Scitator
Copy link
Author

Thanks for reply, now it looks much more realistic :) .
Nevertheless, what does n_timesteps in benchmark.md table mean?
number of steps during evaluation?

@araffin
Copy link
Owner

araffin commented Sep 10, 2019

number of steps during evaluation?

yes, I could fix either the number of episodes or the number of steps, I chose the latter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants