Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable #1774

Merged
merged 16 commits into from
Sep 13, 2021

Conversation

RaghuSpaceRajan
Copy link
Contributor

Since seed() is being called in default initialization of Space, it should be controllable for reproducibility.

Since seed() is being called in default initialization of Space, it should be controllable for reproducibility.
@pzhokhov
Copy link
Collaborator

pzhokhov commented Feb 10, 2020

I think the reproducibility of space.sample() can be achieved without this API change. Namely, because the constructors do not draw from the random number generator, you can do something like

space = spaces.Discrete(10) # using Discrete as an example, any subclass of space would work here
space.seed(123)
sample = space.sample() # always returns the same sample

The call to seed() in the Space constructor is merely so that user won't have to call space.seed() if reproducibility is not required (otherwise, self.np_random would be None, and sampling will fail).

@RaghuSpaceRajan
Copy link
Contributor Author

RaghuSpaceRajan commented Feb 10, 2020

I'm sorry I should have mentioned that I know that the seed for the space can be controlled. However, if you want to subclass the Space environment and do something in the __init__() function which depends on the seed, then you have to have the ugly workaround of instantiating an object of the subclass first and then calling seed() with a known seed and then having another secondary init() function which does the part of the initialisation that was dependent on the seed.

Additionally, it does waste some computation, however small that might be, to call seed() without an argument and then re-call it with an argument when reproducibility is required (which should be the case in the majority of use cases because reproducibility is such a key issue in RL).

@RaghuSpaceRajan
Copy link
Contributor Author

On a related note, exposing the seed to be controllable doesn't break any of the existing code since we pass in a default of None, but we might also want to consider passing in a default value such as 0 for the seed, so that in case, someone ran an experiment without paying heed to the seed, they can at least relaunch it, in case it turned up something interesting, especially if the experiment was expensive.

@RaghuSpaceRajan
Copy link
Contributor Author

I just realised that the seed dependent initialisation part could be done within seed() itself, or in reset() possibly (see discussion below), since these functions are always called for the environment before using it. I was taking the meanings of __init__() and seed() too literally and thus wanted to initialise only inside __init__() and only set seed within seed(). But that might be a matter of taste. So, I'm closing this issue.

The other 2 minor points I discussed (about wasting a small amount of computation and setting a default seed of, say, 0) still remain but I suppose they're not major enough for me and if you like we can re-open the issue and discuss those.

Another important point for anyone reading this discussion would be that, I think, we should not do seed dependent initialisation in reset() because that would probably lead to the same initial state every time the environment is reset and this would be bad for learning algorithms because we could have the same training trajectories for different episodes.

@RaghuSpaceRajan
Copy link
Contributor Author

RaghuSpaceRajan commented May 19, 2020

Hey, sorry, but I have to reopen this. We have a library which we want to release publicly: MDP Playground. We initialise the seed here. And after that we do a lot of init for the environment since we generate the transition and reward function for the environment randomly, for instance here. As I said, in my last comment, this part could be moved to seed(). But the initialisation of the environment components is not meant to be there. It should be in __init__(). Initially, I thought it would be okay to maintain a separate fork here and recommend users to use that. But I now realise that that's leading to conflicts if I keep the package name the same as gym. If I change the package name, then that leads to problems using my fork because some RL frameworks require gym compatibility.

Another reason to reopen is the new commit for controlling seeds of individual spaces within a Tuple. I'm creating Tuple environments which hold other Discrete environments (possibly of the same size) and each of those environments is required to have different dynamics, which is not possible if they are initialised with the same seed, as is done currently in seed().

This pull request is backwards compatible and improves the API for various reasons I have outlined in my previous comments and doesn't cause any failures for existing tests. Could you please accept it?

@pzhokhov
Copy link
Collaborator

pzhokhov commented Aug 14, 2020

Hi @RaghuSpaceRajan ! Sorry about the delay! I like the change about initializing Tuple space seed with an iterable (probably makes sense to add mirror version for Dict space - initialize it with a dict)
Still very confused about the need to include seed into Space constructor though. Specifically, I am proposing to replace this line: https://github.com/anonips/-MDP-Playground/blob/74431f98c210830a93a1bc83fcdcb95bf1644696/rl_toy/envs/rl_toy_env.py#L142
(which reads:

self.observation_space = DiscreteExtended(config["state_space_size"], seed=config["state_space_size"] + config["seed"]) #seed #hack #TODO Gym (and so Ray) apparently needs "observation"_space as a member. I'd prefer "state"_space

with

self.observation_space = Discrete(config["state_space_size"])
self.observation_space.seed(config["state_space_size"] + config["seed"])
  • think that should have the same result?
    Similarly, all other spaces can be seeded immediately after assignment.

@RaghuSpaceRajan
Copy link
Contributor Author

Hi @pzhokhov , thanks for the reply! That's a good idea for the Dict space. I'll add it.

Regarding the change you suggested for seeding Space subclasses immediately after init, doesn't it waste computation to call seed() twice? It is already called in Space.__init__(). It is admittedly, a small computation in most cases.

@RaghuSpaceRajan
Copy link
Contributor Author

Hi @pzhokhov I added the seeding for Dict as well as some test cases.

I see that this pull request makes the argument in my last comment about seeding twice moot and now we have to call seed() after the init in case we want an explicit seed. I still feel though that it's good to at least store the int, that is going to be used as the seed later, explicitly. This would help if someone wants to do some seed dependent initialisation for a Space within its init method. What do you think?

P.S.: I see that I confused Space and Env in one of my comments above. Sorry about that. I struck through that comment part.

@jkterry1
Copy link
Collaborator

Reviewer: @Bam4d

@RaghuSpaceRajan could you please fix the merge conflict if you see this?

@Bam4d
Copy link

Bam4d commented Aug 4, 2021

@jkterry1 @RaghuSpaceRajan It looks to me like this is a good ticket to merge, but not without a few changes. I agree that the Tuple and Dict space seed function should iteratively seed sub-spaces for reproducibility, but having the seed in __init__ is superflous and should be removed from this PR.

Copy link

@Bam4d Bam4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the seeds from the init functions, but keep the recursive seeding for dict and tuple spaces

@jkterry1
Copy link
Collaborator

jkterry1 commented Aug 5, 2021

I'm going to close this PR as it appears to have gone stale and I created an accompanying issue for adding these changes going forward. If anyone would be willing to create a new updated PR thats based off master and includes the fixes bam4d mentioned, I would greatly appreciate it.

@jkterry1 jkterry1 closed this Aug 5, 2021
@RaghuSpaceRajan
Copy link
Contributor Author

Hey, @jkterry1 , could you please re-open the issue? I've been sick lately, sorry. I'll get to it soon. The conflicts had looked fixable when I glanced through them.

I did have a question about seed in __init__ that wasn't answered above. Having the seed in __init__ would help if someone would want to do some seed dependent initialisation for a Space within its init method. What do you think about this?

@jkterry1 jkterry1 reopened this Aug 5, 2021
@jkterry1
Copy link
Collaborator

@Bam4d can you answer the question about the init method when you get a chance?

@Bam4d
Copy link

Bam4d commented Aug 11, 2021

I guess its a bit cleaner if the seed is in the __init__ as it reduces the function calls. I have no problem with this, I'm happy for it to be merged.

@benblack769
Copy link

benblack769 commented Aug 12, 2021

While the PR generally looks good, I think it needs better support for passing in a plain integer as a seed. I don't like the added complexity of having to pass in lots of different seeds for a single space. This sounds unnecessary to me. A better, simpler, more general solution to your problem that is easier for RL frameworks to handle, and API backwards compatible is to simply give different seeds to each sub-space by adding constants to the base seed.

For tuples, this is simple, you can use the same seeding strategy as is used in vector environments: https://github.com/openai/gym/blob/master/gym/vector/sync_vector_env.py#L54

def seed(self, seeds=None):
	if seeds is None:
		seeds = [None for _ in range(len(self.spaces))]
	if isinstance(seeds, int):
		seeds = [seeds + i for i in range(len(self.spaces))]
	assert len(seeds) == self.num_envs

	for space, seed in zip(self.spaces, seeds):
		space.seed(seed)

For dicts, things get a bit more subtle. If the key is orderable, you can just sort the list of items and do the same thing as for tuples. If the key is not orderable, you might be able to get a sufficiently good solution by adding the hash of the seed (of course, the hash is well defined otherwise the dict would not work).

def seed(self, seeds=None):
	if isinstance(seeds, int):
		seeds = {key: seed + hash(key) for key, space in self.spaces.items()}
	...

@Bam4d @RaghuSpaceRajan
Do you think this strategy is good?

@RaghuSpaceRajan
Copy link
Contributor Author

@benblack769 , the changes I made are already backwards compatible. If one doesn't pass a list of seeds for Tuple or a dict of seeds for Dict, the old procedure is used to seed.

Additionally, I'm no authority on this but I'm aware that linearly increasing seeds may not be the best idea (https://stackoverflow.com/a/1554995/11063709). It may/may not be a problem for RL envs, I can't say for sure.

@Bam4d Thanks for the input! I'll try to finish the PR today.

@RaghuSpaceRajan
Copy link
Contributor Author

Hi @Bam4d , I made the remaining changes and the tests are also passing. Please let me know if you think something's still missing.

@benblack769
Copy link

So the Gym developers were aware of the linearly increasing seed problem, and use a crytographic grade hash on the seed before putting it into numpy:
https://github.com/openai/gym/blob/master/gym/utils/seeding.py#L45
https://github.com/openai/gym/blob/master/gym/spaces/space.py#L43

@benblack769
Copy link

So this is not a problem. I think that the plan I laid out above is simple and much better than the current state of things.

@RaghuSpaceRajan
Copy link
Contributor Author

@benblack769, thanks for the links, good to know!

So, the difference in the current approach and what you propose is only when an int is passed, right?

And you propose that when an int is passed, even then different seeds should be used for seeding the sub-spaces.

I think there are advantages and disadvantages for the proposed approach:
Advantage: Easy to pass an int to let a vector (list/dict) of values be used. The vectors you suggested above are good defaults.
Disadvantage: Less intuitive that an int is "broadcast" to a vector of values non-trivially. Someone might expect the same int to be used as seed for all sub-spaces when passing a single int.

I personally am in favour of the current approach because it feels more intuitive to me that passing an int uses the same int for all sub-spaces, and because it takes just an extra line of code to go from the int to the vector of seed values that you suggested in your code.

Having said that, I'm fine with the new approach you proposed as well in case a majority is in favour of it. Maybe, @Bam4d can break the tie.

@jkterry1
Copy link
Collaborator

@RaghuSpaceRajan please do it Ben's way

@RaghuSpaceRajan
Copy link
Contributor Author

@jkterry1 @Bam4d @benblack769 #2365 has created merge conflicts with this PR.

#2365 tries to fix the sub-case of the issue here, when the seed passed would be an int. This conflicts with the way @benblack769 suggested to do it above - #2365 suggests randomly generating multiple seeds based on the passed seed while @benblack769 suggested adding either hash(key) (for a Dict) or sub-space index (for a Tuple) to the seed. So, which of these 2 methods should I use?

Additional info: #2365 also fixed the bug that seed() did not return a list for Dict and Tuple. I assume that's compatible with this issue. We, may, however want seed() to return a dict in case of Dict. What do you think?

@benblack769
Copy link

  1. Generating random seeds to populate the elements of a tuple env should be fine. I see no problem with that. For a dict env, I don't like it because dictionary order isn't a reliable property of a dictionary, only the set of keys is. Unfortunately, I've learned that hashing is also not reliable. So perhaps the order is the best that we can do, and the current implementation is fine.
  2. I think it would be best for seed() to return whatever the seed method accepts as its argument. If it accepts a dict, then return a dict.

@RaghuSpaceRajan
Copy link
Contributor Author

@benblack769 , thanks for the input!

Regarding point 2, I agree with you. However, the default return of seed() for Space (and so I assume for all spaces) is a list, even when an int is passed. So, do you have any thoughts on how we could make it consistent for all classes?

@RaghuSpaceRajan
Copy link
Contributor Author

So, I resolved the other issues. Still not sure about how to resolve the last issue (see previous comment). Why is even the default return of seed() for Space a list and not an int?

@RaghuSpaceRajan RaghuSpaceRajan changed the title Make the seed in default initialization controllable Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable Sep 7, 2021
@benblack769
Copy link

I have no idea. Like I said before, it makes most sense to me that what is returned from seed() can be passed back into it. @jkterry1 Is this something that can be changed or is it something that we just have to deal with?

@jkterry1
Copy link
Collaborator

@RaghuSpaceRajan after discussion I think the least bad option is to do a one time breaking change here such that seed always returns the same datastructure.

@RaghuSpaceRajan
Copy link
Contributor Author

@jkterry1 thanks for the input! seed() currently always returns a list of ints so I presume I do not need to change anything else now. Did I get that right?

@jkterry1
Copy link
Collaborator

@RaghuSpaceRajan I think so, but just to confirm the dtype of the output of seed here is the same as the input right?

@RaghuSpaceRajan
Copy link
Contributor Author

I think then that there has been a misundertanding. @jkterry1, the output of seed() is currently not the same for (almost) any of the classes. The default return of seed() for all spaces is a list, even when an int is passed (that's how it is tested here:

def test_seed_returns_list(space):
).

We would need to change Space to change the default return of seed() to an int so that it matches its argument. I could do that but I do not know if that might have any cascading effects beyond just failing the above test, e.g., someone using Gym might have written code that depends on the return of seed() being a list.

@jkterry1
Copy link
Collaborator

In this case I'm going to merge this and create a separate issue to figure out what seed should return in the future

@jkterry1 jkterry1 merged commit c571b0d into openai:master Sep 13, 2021
Comment on lines 56 to +91
def seed(self, seed=None):
seed = super().seed(seed)
try:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=False, # unique subseed for each subspace
)
except ValueError:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=True, # we get more than INT_MAX subspaces
)

for subspace, subseed in zip(self.spaces.values(), subseeds):
seed.append(subspace.seed(int(subseed))[0])

return seed
seeds = []
if isinstance(seed, dict):
for key, seed_key in zip(self.spaces, seed):
assert key == seed_key, print(
"Key value",
seed_key,
"in passed seed dict did not match key value",
key,
"in spaces Dict.",
)
seeds += self.spaces[key].seed(seed[seed_key])
elif isinstance(seed, int):
seeds = super().seed(seed)
try:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=False, # unique subseed for each subspace
)
except ValueError:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=True, # we get more than INT_MAX subspaces
)

for subspace, subseed in zip(self.spaces.values(), subseeds):
seeds.append(subspace.seed(int(subseed))[0])
elif seed is None:
for space in self.spaces.values():
seeds += space.seed(seed)
else:
raise TypeError("Passed seed not of an expected type: dict or int or None")

return seeds
Copy link
Contributor

@XuehaiPan XuehaiPan Sep 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we call super().seed(seed) to seed self.np_random as well when seed is dict or None here?

  2. We could merge the last two cases (seed is int or None) into one. After statement seeds = super().seed(seed), self.np_random become seeded and variable seeds is a list of integers.

  3. Should we only add the main seed of subspace (subspace.seed()[0]) to the return value rather than extend all items in subspace.seed() (the op seeds += in the first and the last cases)?
    If the subpace of Dict here is another compound space (Tuple or Dict), the length of the return list could be different len(d.seed(None)) != len(d.seed(0)). (However, I think we probably only use the first item, and hardly ever use the rest in the list and the length of the list. This issue probably never affects normal users.)

    def test_seed_returns_list(space):
    def assert_integer_list(seed):
    assert isinstance(seed, list)
    assert len(seed) >= 1
    assert all([isinstance(s, int) for s in seed])
    assert_integer_list(space.seed(None))
    assert_integer_list(space.seed(0))

    We could add a new test assert len(space.seed(None)) == len(space.seed(0)) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'd say it causes minimal harm and could be done. Doing this would create the _np_random PRNG of the "base" class's object and make it consistent with other spaces in case all Space objects are expected to have an _np_random that is not None, although I assume that only _np_randoms of the sub-classes are ever used.
  2. It's subjective. For the case, when it's None, I assume the user would want the seed of every sub-space to be as random as possible in the sense that they would want all of them to be seeded with None. But generating the seeds for sub-spaces using an _np_random PRNG generated based on None (i.e., the suggested way) should also be fine.
  3. I don't think that there's an expected fixed length of the list of seeds returned, so I do not think it makes a difference either way. In any case, the length of the list of seeds returned can always be calculated if the sub-spaces are known, in case a user really wants dig deeper.

So, in general, I'm fine with any of the suggestions either being or not being implemented (also for the suggestions on Tuple below). Please let me know.

Comment on lines 21 to +50
def seed(self, seed=None):
seed = super().seed(seed)
try:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=False, # unique subseed for each subspace
)
except ValueError:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=True, # we get more than INT_MAX subspaces
)
seeds = []

if isinstance(seed, list):
for i, space in enumerate(self.spaces):
seeds += space.seed(seed[i])
elif isinstance(seed, int):
seeds = super().seed(seed)
try:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=False, # unique subseed for each subspace
)
except ValueError:
subseeds = self.np_random.choice(
np.iinfo(int).max,
size=len(self.spaces),
replace=True, # we get more than INT_MAX subspaces
)

for subspace, subseed in zip(self.spaces, subseeds):
seed.append(subspace.seed(int(subseed))[0])
for subspace, subseed in zip(self.spaces, subseeds):
seeds.append(subspace.seed(int(subseed))[0])
elif seed is None:
for space in self.spaces:
seeds += space.seed(seed)
else:
raise TypeError("Passed seed not of an expected type: list or int or None")

return seed
return seeds
Copy link
Contributor

@XuehaiPan XuehaiPan Sep 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here for Tuple (see Dict above).

  1. Call super().seed(seed).
  2. Merge cases seed is int and seed is None.
  3. Only add the main seed of the subspace to the return value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants