Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation runs way too many evaluation episodes #296

Closed
JakobThumm opened this issue Oct 11, 2022 · 4 comments
Closed

Evaluation runs way too many evaluation episodes #296

JakobThumm opened this issue Oct 11, 2022 · 4 comments

Comments

@JakobThumm
Copy link

JakobThumm commented Oct 11, 2022

Describe the bug
The evaluation runs more than n_eval_episodes. (>100 eval episodes or even infinite)

Code example
For my custom env, the evaluation runs for >100 episodes, even though I set the number of eval episodes to 3.

I was able to reproduce the error for a common environment:

python train.py --algo sac --env BipedalWalkerHardcore-v3 --yaml-file hyperparameters/sac.yml -P --seed 42 --eval-freq 5000 --eval-episodes 3 --n-eval-envs 1

sac.yml

BipedalWalkerHardcore-v3:
  env_wrapper:
    - gym.wrappers.TimeLimit:
        max_episode_steps: 1000
  n_timesteps: !!float 1e7
  policy: 'MlpPolicy'
  learning_rate: lin_7.3e-4
  buffer_size: 1000000
  batch_size: 256
  ent_coef: 0.005
  gamma: 0.99
  tau: 0.01
  train_freq: 1
  gradient_steps: 1
  learning_starts: 10000
  policy_kwargs: "dict(net_arch=[256, 256])"

Note that this issue occurs if and only if I change the net_arch from [400, 300] to [256, 256]. This issue also does not occur on seed 0, but it does happen on seed 42.

Apparently, the evaluation is doing more than I expect. I would assume, the evaluation just runs for the given number of episodes and then continues training.

System Info
Describe the characteristic of your environment:

  • sb3-contrib 1.6.1
    rl-zoo3 1.6.2.post1 (from source)
  • Problem occurs both on cuda and cpu settings.
  • Python version Python 3.8.13

Additional Info
I created a simple wrapper that prints a statement whenever a new episode begins to debug this issue.

@araffin
Copy link
Member

araffin commented Oct 11, 2022

Hello,
how do you know it is doing more than 3 evaluations episodes?

env_wrapper:
- gym.wrappers.TimeLimit:
max_episode_steps: 1000

Why are you adding a timelimit?
If you do so, you need to add a monitor file afterward so it is taken into account.
Otherwise the evaluation will only use the original termination (see DLR-RM/stable-baselines3#181 for why we are doing that).

EDIT: to check the number of evaluations:

import numpy as np

evaluations = np.load("logs/sac/BipedalWalkerHardcore-v3_12/evaluations.npz")
print(evaluations["ep_lengths"].shape)

@JakobThumm
Copy link
Author

JakobThumm commented Oct 11, 2022

Why are you adding a timelimit?

In my custom environment, I would like to have limited episode length. Isn't the TimeLimit Wrapper the way to go then?

If you do so, you need to add a monitor file afterward so it is taken into account.

I added the basic common.monitor.Monitor, which fixed the issue.
I still don't fully understand why we need the monitor after reading the linked issue. However, if simply adding a monitor fixes the issue, I'm happy :) Thank you Antonin

@araffin
Copy link
Member

araffin commented Oct 11, 2022

I still don't fully understand why we need the monitor after reading the linked issue.

Best is to take a look at the code: https://github.com/DLR-RM/stable-baselines3/blob/52c29dc497fa2eb235d0476b067bed8ac488fe64/stable_baselines3/common/evaluation.py#L103-L114

@JakobThumm
Copy link
Author

This clarifies the matter, thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants