Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed calls to reset from init #2394

Merged
merged 11 commits into from
Sep 16, 2021
Merged

Conversation

ahmedo42
Copy link
Contributor

@ahmedo42 ahmedo42 commented Sep 5, 2021

fixes #2242

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 5, 2021

@ahmedo42 can you please make tests pass?

@ahmedo42
Copy link
Contributor Author

ahmedo42 commented Sep 5, 2021

test_record_video_using_default_trigger is flaky , I'll look into it

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 6, 2021

@ahmedo42 thanks a ton. Just to confirm I'm not going crazy here, this shouldn't require a version bump right?

@ahmedo42
Copy link
Contributor Author

ahmedo42 commented Sep 6, 2021

@jkterry1 it shouldn't since it's not a bug fix and behavior is the same ( I think )

Copy link
Contributor

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and sorry for the flaky test. If this does not fix I think going forward I will create a dummy end that only outputs black rgb pixels for deterministic test

gym/wrappers/test_record_video.py Outdated Show resolved Hide resolved
@RedTachyon
Copy link
Contributor

The behavior is not exactly the same. The big difference is that every environment will need to be reset'd before using it.

With envs made by gym.make, typically TimeLimit seems to enforce this.

If you do e.g. env = BipedalWalker(), you can instantly call env.step and it will work normally.

With this PR, my guess is that it will crash in an unpredictable way (some variable or field will be undefined)

@RedTachyon
Copy link
Contributor

Specific example:

import gym
from gym.envs.box2d import BipedalWalker
env = BipedalWalker()
env.step(env.action_space.sample())

Output with this PR:

AttributeError                            Traceback (most recent call last)
<ipython-input-7-38fa47b0972b> in <module>()
----> 1 env.step(env.action_space.sample())

/usr/local/lib/python3.7/dist-packages/gym/envs/box2d/bipedal_walker.py in step(self, action)
    410             self.joints[3].motorSpeed = float(SPEED_KNEE * np.clip(action[3], -1, 1))
    411         else:
--> 412             self.joints[0].motorSpeed = float(SPEED_HIP * np.sign(action[0]))
    413             self.joints[0].maxMotorTorque = float(
    414                 MOTORS_TORQUE * np.clip(np.abs(action[0]), 0, 1)

AttributeError: 'BipedalWalker' object has no attribute 'joints'

Output with gym==0.19.0:

(array([ 0.0127883 , -0.00783346,  0.00696973,  0.02636976, -0.27169684,
        -0.24278045,  1.46274191,  0.85486007,  1.        , -0.37531292,
        -0.69931543,  1.70809513,  0.99102688,  1.        ,  0.45706901,
         0.46225971,  0.47843772,  0.50760233,  0.55379778,  0.62467676,
         0.73529869,  0.9186005 ,  1.        ,  1.        ]),
 -0.06912166370948038,
 False,
 {})

(that is, normal output)

@ahmedo42
Copy link
Contributor Author

ahmedo42 commented Sep 6, 2021

@RedTachyon thanks for the catch , I think that you shouldn't be able to call env.step() directly without calling reset() first ( for all envs)

@RedTachyon
Copy link
Contributor

I lean towards not enforcing this in the general case. If a specific custom environment doesn't need resetting, why should that be required? I imagine this could be relevant in some lifelong learning-like scenarios. Not a big deal, but an additional pain with no real benefit.

As for the existing included environments, manual checks could be added in their respective classes I suppose, at the very least to have a sane error message.

But ultimately I don't really think the resets in the init are a big deal, they can stay imo.

@ahmedo42
Copy link
Contributor Author

ahmedo42 commented Sep 6, 2021

I see your point , the whole idea was to be consistent , some envs have reset in init but some don't which can be confusing , at least for me when I read the envs documentation

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 6, 2021

"I think that you shouldn't be able to call env.step() directly without calling reset() first ( for all envs)"

I agree with ahmedo42 on this for a variety of reasons. We should also specifically test for this in the API compliance tests, @ahmedo42 could you please add that real quick?

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 6, 2021

@ahmedo42 the environments should also issue a warning if you call step before reset that all environments inherit from the base class, like what PettingZoo does

@RedTachyon
Copy link
Contributor

Wouldn't this require adding verification logic to the currently completely abstract Env class? I think it's much better to just add it to the existing environments (and actually making it an exception, and not just a warning), but keep the base class as "clean" as possible.

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 6, 2021

It should be an exception not a warning yes, sorry. I don't understand your other concerns?

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Sep 6, 2021

Per fixes in #2401, could you remove the changes in the test_record_video.py?

@RedTachyon
Copy link
Contributor

My concern is about how the no-step-before-reset rule would be enforced. Currently it's in the TimeLimit wrapper. The only way I see to automatically enforce it in anything inheriting from gym.Env would be redefining the base reset function, which I think would be a pretty major change.

Unless I'm just misunderstanding and the suggestion is to add this check to the specific environments already within gym, which I think is fine.

@ahmedo42
Copy link
Contributor Author

ahmedo42 commented Sep 7, 2021

A solution that mimics pettingzoo would be like this :

def get_env():
    env = BipedalWalker()
    env = TimeLimit(env)
    return env

which will be implemented in every environment

from the user's side

import gym
from gym.envs.box2d import bipedal_walker
env = bipedal_walker.get_env()

Any cleaner solution is welcome , note that using gym.make() already throws the appropriate exception.

@benblack769
Copy link

benblack769 commented Sep 11, 2021

I belive that you can automatically wrap environments during environment make. For example, how TimeLimit is wrapped https://github.com/openai/gym/blob/master/gym/envs/registration.py#L109.

For environments which don't have a time limit, you can create a wrapper (perhaps EnforceCallOrder) specifically to enforce that reset() is called before step(), and wrap the environment similar to how the TimeLimit environment is wrapped.

Of course, there should be some way to disable this wrapping via the env spec, but it should probably be enabled by default.

from gym.wrappers.time_limit import TimeLimit

env = TimeLimit(env, max_episode_steps=env.spec.max_episode_steps)
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some way to disable wrapping the environment with the OrderEnforcing wrapper via the env.spec.

@benblack769
Copy link

Looks good, just one comment about allowing this to be disabled if someone really doesn't want their environment to be wrapped for some reason.

@benblack769
Copy link

I was thinking that the argument would be another optional argument to the EnvSpec class, similar to max_episode_steps, rather than the kwargs, which are really meant to be passed to the environment directly.

@ahmedo42
Copy link
Contributor Author

@benblack769 This was what I thought of initially but can't figure out a clean way to do it
what I want to do ideally is :

env = gym.make("someEnv",order_enforce=False) 
# but this is considered a kwarg for the env 

Is there a way to override the args of EnvSpec from gym.make() such as max_episode_steps , nondetermenistic etc..?

@benblack769
Copy link

benblack769 commented Sep 13, 2021

What I was thinking is that the environment user would not be thinking about this choice that much, this is a choice that the environment maintainer would be making, primarily. In particular, I am worried about the case where wrapping the environment by default would break a lot of downstream users code, so the environment maintainer could keep the current behavior by doing something like

register(
    id="CustomEnv-v1",
    entry_point="...",
    order_enforce=False,
)

@benblack769
Copy link

Thanks, I think this looks much better.

@jkterry1 jkterry1 merged commit 2754d97 into openai:master Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reset() calls from __init__ ?
5 participants