Evaluation runs way too many evaluation episodes #296

JakobThumm · 2022-10-11T09:50:46Z

Describe the bug
The evaluation runs more than n_eval_episodes. (>100 eval episodes or even infinite)

Code example
For my custom env, the evaluation runs for >100 episodes, even though I set the number of eval episodes to 3.

I was able to reproduce the error for a common environment:

python train.py --algo sac --env BipedalWalkerHardcore-v3 --yaml-file hyperparameters/sac.yml -P --seed 42 --eval-freq 5000 --eval-episodes 3 --n-eval-envs 1

sac.yml

BipedalWalkerHardcore-v3:
  env_wrapper:
    - gym.wrappers.TimeLimit:
        max_episode_steps: 1000
  n_timesteps: !!float 1e7
  policy: 'MlpPolicy'
  learning_rate: lin_7.3e-4
  buffer_size: 1000000
  batch_size: 256
  ent_coef: 0.005
  gamma: 0.99
  tau: 0.01
  train_freq: 1
  gradient_steps: 1
  learning_starts: 10000
  policy_kwargs: "dict(net_arch=[256, 256])"

Note that this issue occurs if and only if I change the net_arch from [400, 300] to [256, 256]. This issue also does not occur on seed 0, but it does happen on seed 42.

Apparently, the evaluation is doing more than I expect. I would assume, the evaluation just runs for the given number of episodes and then continues training.

System Info
Describe the characteristic of your environment:

sb3-contrib 1.6.1
rl-zoo3 1.6.2.post1 (from source)
Problem occurs both on cuda and cpu settings.
Python version Python 3.8.13

Additional Info
I created a simple wrapper that prints a statement whenever a new episode begins to debug this issue.

The text was updated successfully, but these errors were encountered:

araffin · 2022-10-11T10:57:56Z

Hello,
~~how do you know it is doing more than 3 evaluations episodes?~~

env_wrapper:
- gym.wrappers.TimeLimit:
max_episode_steps: 1000

Why are you adding a timelimit?
If you do so, you need to add a monitor file afterward so it is taken into account.
Otherwise the evaluation will only use the original termination (see DLR-RM/stable-baselines3#181 for why we are doing that).

EDIT: to check the number of evaluations:

import numpy as np

evaluations = np.load("logs/sac/BipedalWalkerHardcore-v3_12/evaluations.npz")
print(evaluations["ep_lengths"].shape)

JakobThumm · 2022-10-11T11:11:54Z

Why are you adding a timelimit?

In my custom environment, I would like to have limited episode length. Isn't the TimeLimit Wrapper the way to go then?

If you do so, you need to add a monitor file afterward so it is taken into account.

I added the basic common.monitor.Monitor, which fixed the issue.
I still don't fully understand why we need the monitor after reading the linked issue. However, if simply adding a monitor fixes the issue, I'm happy :) Thank you Antonin

araffin · 2022-10-11T11:20:00Z

I still don't fully understand why we need the monitor after reading the linked issue.

Best is to take a look at the code: https://github.com/DLR-RM/stable-baselines3/blob/52c29dc497fa2eb235d0476b067bed8ac488fe64/stable_baselines3/common/evaluation.py#L103-L114

JakobThumm · 2022-10-11T13:56:05Z

This clarifies the matter, thanks 👍

JakobThumm closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation runs way too many evaluation episodes #296

Evaluation runs way too many evaluation episodes #296

JakobThumm commented Oct 11, 2022 •

edited by araffin

Loading

araffin commented Oct 11, 2022 •

edited

Loading

JakobThumm commented Oct 11, 2022 •

edited

Loading

araffin commented Oct 11, 2022

JakobThumm commented Oct 11, 2022

Evaluation runs way too many evaluation episodes #296

Evaluation runs way too many evaluation episodes #296

Comments

JakobThumm commented Oct 11, 2022 • edited by araffin Loading

araffin commented Oct 11, 2022 • edited Loading

JakobThumm commented Oct 11, 2022 • edited Loading

araffin commented Oct 11, 2022

JakobThumm commented Oct 11, 2022

JakobThumm commented Oct 11, 2022 •

edited by araffin

Loading

araffin commented Oct 11, 2022 •

edited

Loading

JakobThumm commented Oct 11, 2022 •

edited

Loading