-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation runs way too many evaluation episodes #296
Comments
Hello,
Why are you adding a timelimit? EDIT: to check the number of evaluations: import numpy as np
evaluations = np.load("logs/sac/BipedalWalkerHardcore-v3_12/evaluations.npz")
print(evaluations["ep_lengths"].shape) |
In my custom environment, I would like to have limited episode length. Isn't the TimeLimit Wrapper the way to go then?
I added the basic |
Best is to take a look at the code: https://github.com/DLR-RM/stable-baselines3/blob/52c29dc497fa2eb235d0476b067bed8ac488fe64/stable_baselines3/common/evaluation.py#L103-L114 |
This clarifies the matter, thanks 👍 |
Describe the bug
The evaluation runs more than n_eval_episodes. (>100 eval episodes or even infinite)
Code example
For my custom env, the evaluation runs for >100 episodes, even though I set the number of eval episodes to 3.
I was able to reproduce the error for a common environment:
sac.yml
Note that this issue occurs if and only if I change the
net_arch
from[400, 300]
to[256, 256]
. This issue also does not occur on seed 0, but it does happen on seed 42.Apparently, the evaluation is doing more than I expect. I would assume, the evaluation just runs for the given number of episodes and then continues training.
System Info
Describe the characteristic of your environment:
rl-zoo3 1.6.2.post1 (from source)
Additional Info
I created a simple wrapper that prints a statement whenever a new episode begins to debug this issue.
The text was updated successfully, but these errors were encountered: