-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Evaluation helper on Monitor wrapped environments #894
Comments
To make the env = gym.make("SpaceInvadersNoFrameskip-v4")
env = Monitor(env)
env = AtariWrapper(env, terminal_on_life_loss=True)
obs = env.reset()
while True:
obs, reward, done, info = env.step(env.action_space.sample())
if done:
print(info)
break
print(env.get_episode_rewards())
print(env.unwrapped.ale.lives())
# Output:
# {'lives': 2, 'episode_frame_number': 836, 'frame_number': 836}
# []
# 2 However, if we switch the wrapping order, losing a life is then treated as the end of an episode. env = gym.make("SpaceInvadersNoFrameskip-v4")
env = AtariWrapper(env, terminal_on_life_loss=True)
env = Monitor(env)
obs = env.reset()
while True:
obs, reward, done, info = env.step(env.action_space.sample())
if done:
print(info)
break
print(env.get_episode_rewards())
print(env.unwrapped.ale.lives())
# Output:
# {'lives': 2, 'episode_frame_number': 1505, 'frame_number': 1505, 'episode': {'r': 11.0, 'l': 368, 't': 1.910763}}
# [11.0]
# 2 |
@araffin @Meyer99 I actually had an issue related to this. I noticed that the order of the wrappers actually changes the behavior (and I think this is not desirable). First
|
For the first case, "episode" is not in I think the implementation first wrapping the environment with |
@Meyer99 ok, I've seen in other issues (#477 and #789) that the combination of At first it looked like a bug, but in #789 @araffin says it's by design (in this comment). I my case, I managed to make it work this way: base_env = MyCustomEnv(...)
env = make_vec_env(
env_id=TimeLimit,
n_envs=10, # 10 is a random number, not important for what we're discussing
env_kwargs=dict(
env=base_env,
max_episode_steps=30 # 30 is a random number, not important for what we're discussing
)
) I think it's a bit cleaner than what suggested in #477 and #789, and also it might be nice to reference this in the documentation. @araffin what do you think? In case I can make a PR |
Wrappers for tr and eval envs were in the wrong order, so the _elapsed_steps attribute of the TimeLimit wrapper was a single one for all the instances of the env in the VecEnv. Now each instance is wrapped with a TimeLimit instance and each one of them has its own (independent) _elapsed_steps See this discussion on Github: DLR-RM/stable-baselines3#894 (comment)
yes, it's by design, because the time limit wrapper is applied when calling def my_wrapper(env):
env = TimeLimit(env, 100) # new time limit
env = Monitor(env) # wrap it with monitor env again to explicitely take the change into account
return env
make_vec_env(..., wrapper_class=my_wrapper) or use or even (probably the cleanest solution): def make_env():
env = MyEnv()
env = TimeLimit(env, 100) # new time limit
return env
vec_env = make_vec_env(make_env, ...)
I would be happy to receive a PR that adds a note in the doc linking to the different issues |
That looks good! @araffin if you agree, in my PR I could change that too |
well, more
Please do =) |
Wouldn't that not "accept" strings anymore? |
I meant |
Opened PR #1085 |
Question
Hi all,
I would like to know the reasons
"episode" in info.keys()
corresponds to the true end of an episode.The above code from
evaluation.py
indicates that using"episode" in info.keys()
to determine the true end of an episode, in case Atari wrapper sends a "done" signal when the agent loses a life. However, inmonitor.py
, episode info is added toinfo
when there's a "done" signal.If the environment is wrapped with
EpisodicLifeEnv
, how does using"episode" in info.keys()
be able to distinguish between a true termination state and losing a life? Does that mean"episode" in info.keys()
always evaluates to True forMonitor
wrapped environments whendone = True
?Additional context
Checklist
The text was updated successfully, but these errors were encountered: