-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removed calls to reset from init #2394
Conversation
@ahmedo42 can you please make tests pass? |
|
@ahmedo42 thanks a ton. Just to confirm I'm not going crazy here, this shouldn't require a version bump right? |
@jkterry1 it shouldn't since it's not a bug fix and behavior is the same ( I think ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix and sorry for the flaky test. If this does not fix I think going forward I will create a dummy end that only outputs black rgb pixels for deterministic test
The behavior is not exactly the same. The big difference is that every environment will need to be reset'd before using it. With envs made by If you do e.g. With this PR, my guess is that it will crash in an unpredictable way (some variable or field will be undefined) |
Specific example: import gym
from gym.envs.box2d import BipedalWalker
env = BipedalWalker()
env.step(env.action_space.sample()) Output with this PR:
Output with gym==0.19.0: (array([ 0.0127883 , -0.00783346, 0.00696973, 0.02636976, -0.27169684,
-0.24278045, 1.46274191, 0.85486007, 1. , -0.37531292,
-0.69931543, 1.70809513, 0.99102688, 1. , 0.45706901,
0.46225971, 0.47843772, 0.50760233, 0.55379778, 0.62467676,
0.73529869, 0.9186005 , 1. , 1. ]),
-0.06912166370948038,
False,
{}) (that is, normal output) |
@RedTachyon thanks for the catch , I think that you shouldn't be able to call |
I lean towards not enforcing this in the general case. If a specific custom environment doesn't need resetting, why should that be required? I imagine this could be relevant in some lifelong learning-like scenarios. Not a big deal, but an additional pain with no real benefit. As for the existing included environments, manual checks could be added in their respective classes I suppose, at the very least to have a sane error message. But ultimately I don't really think the resets in the init are a big deal, they can stay imo. |
I see your point , the whole idea was to be consistent , some envs have |
"I think that you shouldn't be able to call env.step() directly without calling reset() first ( for all envs)" I agree with ahmedo42 on this for a variety of reasons. We should also specifically test for this in the API compliance tests, @ahmedo42 could you please add that real quick? |
@ahmedo42 the environments should also issue a warning if you call step before reset that all environments inherit from the base class, like what PettingZoo does |
Wouldn't this require adding verification logic to the currently completely abstract Env class? I think it's much better to just add it to the existing environments (and actually making it an exception, and not just a warning), but keep the base class as "clean" as possible. |
It should be an exception not a warning yes, sorry. I don't understand your other concerns? |
Per fixes in #2401, could you remove the changes in the |
My concern is about how the no-step-before-reset rule would be enforced. Currently it's in the TimeLimit wrapper. The only way I see to automatically enforce it in anything inheriting from Unless I'm just misunderstanding and the suggestion is to add this check to the specific environments already within gym, which I think is fine. |
A solution that mimics pettingzoo would be like this : def get_env():
env = BipedalWalker()
env = TimeLimit(env)
return env which will be implemented in every environment from the user's side import gym
from gym.envs.box2d import bipedal_walker
env = bipedal_walker.get_env() Any cleaner solution is welcome , note that using |
I belive that you can automatically wrap environments during environment make. For example, how TimeLimit is wrapped https://github.com/openai/gym/blob/master/gym/envs/registration.py#L109. For environments which don't have a time limit, you can create a wrapper (perhaps Of course, there should be some way to disable this wrapping via the env spec, but it should probably be enabled by default. |
from gym.wrappers.time_limit import TimeLimit | ||
|
||
env = TimeLimit(env, max_episode_steps=env.spec.max_episode_steps) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be some way to disable wrapping the environment with the OrderEnforcing wrapper via the env.spec.
Looks good, just one comment about allowing this to be disabled if someone really doesn't want their environment to be wrapped for some reason. |
I was thinking that the argument would be another optional argument to the EnvSpec class, similar to |
@benblack769 This was what I thought of initially but can't figure out a clean way to do it env = gym.make("someEnv",order_enforce=False)
# but this is considered a kwarg for the env Is there a way to override the args of |
What I was thinking is that the environment user would not be thinking about this choice that much, this is a choice that the environment maintainer would be making, primarily. In particular, I am worried about the case where wrapping the environment by default would break a lot of downstream users code, so the environment maintainer could keep the current behavior by doing something like
|
Thanks, I think this looks much better. |
fixes #2242