-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable #1774
Conversation
Since seed() is being called in default initialization of Space, it should be controllable for reproducibility.
I think the reproducibility of space = spaces.Discrete(10) # using Discrete as an example, any subclass of space would work here
space.seed(123)
sample = space.sample() # always returns the same sample The call to seed() in the |
I'm sorry I should have mentioned that I know that the seed for the space can be controlled. However, if you want to subclass the Additionally, it does waste some computation, however small that might be, to call |
On a related note, exposing the seed to be controllable doesn't break any of the existing code since we pass in a default of |
I just realised that the seed dependent initialisation part could be done within The other 2 minor points I discussed (about wasting a small amount of computation and setting a default seed of, say, 0) still remain but I suppose they're not major enough for me and if you like we can re-open the issue and discuss those. Another important point for anyone reading this discussion would be that, I think, we should not do seed dependent initialisation in |
Hey, sorry, but I have to reopen this. Another reason to reopen is the new commit for controlling seeds of individual spaces within a This pull request is backwards compatible and improves the API for various reasons I have outlined in my previous comments and doesn't cause any failures for existing tests. Could you please accept it? |
Hi @RaghuSpaceRajan ! Sorry about the delay! I like the change about initializing Tuple space seed with an iterable (probably makes sense to add mirror version for Dict space - initialize it with a dict) self.observation_space = DiscreteExtended(config["state_space_size"], seed=config["state_space_size"] + config["seed"]) #seed #hack #TODO Gym (and so Ray) apparently needs "observation"_space as a member. I'd prefer "state"_space with self.observation_space = Discrete(config["state_space_size"])
self.observation_space.seed(config["state_space_size"] + config["seed"])
|
Hi @pzhokhov , thanks for the reply! That's a good idea for the Regarding the change you suggested for seeding |
Hi @pzhokhov I added the seeding for I see that this pull request makes the argument in my last comment about seeding twice moot and now we have to call P.S.: I see that I confused |
Reviewer: @Bam4d @RaghuSpaceRajan could you please fix the merge conflict if you see this? |
@jkterry1 @RaghuSpaceRajan It looks to me like this is a good ticket to merge, but not without a few changes. I agree that the Tuple and Dict space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the seeds from the init functions, but keep the recursive seeding for dict and tuple spaces
I'm going to close this PR as it appears to have gone stale and I created an accompanying issue for adding these changes going forward. If anyone would be willing to create a new updated PR thats based off master and includes the fixes bam4d mentioned, I would greatly appreciate it. |
Hey, @jkterry1 , could you please re-open the issue? I've been sick lately, sorry. I'll get to it soon. The conflicts had looked fixable when I glanced through them. I did have a question about |
@Bam4d can you answer the question about the init method when you get a chance? |
I guess its a bit cleaner if the seed is in the |
While the PR generally looks good, I think it needs better support for passing in a plain integer as a seed. I don't like the added complexity of having to pass in lots of different seeds for a single space. This sounds unnecessary to me. A better, simpler, more general solution to your problem that is easier for RL frameworks to handle, and API backwards compatible is to simply give different seeds to each sub-space by adding constants to the base seed. For tuples, this is simple, you can use the same seeding strategy as is used in vector environments: https://github.com/openai/gym/blob/master/gym/vector/sync_vector_env.py#L54
For dicts, things get a bit more subtle. If the key is orderable, you can just sort the list of items and do the same thing as for tuples. If the key is not orderable, you might be able to get a sufficiently good solution by adding the hash of the seed (of course, the hash is well defined otherwise the dict would not work).
@Bam4d @RaghuSpaceRajan |
@benblack769 , the changes I made are already backwards compatible. If one doesn't pass a list of seeds for Additionally, I'm no authority on this but I'm aware that linearly increasing seeds may not be the best idea (https://stackoverflow.com/a/1554995/11063709). It may/may not be a problem for RL envs, I can't say for sure. @Bam4d Thanks for the input! I'll try to finish the PR today. |
Hi @Bam4d , I made the remaining changes and the tests are also passing. Please let me know if you think something's still missing. |
So the Gym developers were aware of the linearly increasing seed problem, and use a crytographic grade hash on the seed before putting it into numpy: |
So this is not a problem. I think that the plan I laid out above is simple and much better than the current state of things. |
@benblack769, thanks for the links, good to know! So, the difference in the current approach and what you propose is only when an And you propose that when an I think there are advantages and disadvantages for the proposed approach: I personally am in favour of the current approach because it feels more intuitive to me that passing an Having said that, I'm fine with the new approach you proposed as well in case a majority is in favour of it. Maybe, @Bam4d can break the tie. |
@RaghuSpaceRajan please do it Ben's way |
@jkterry1 @Bam4d @benblack769 #2365 has created merge conflicts with this PR. #2365 tries to fix the sub-case of the issue here, when the seed passed would be an Additional info: #2365 also fixed the bug that |
|
@benblack769 , thanks for the input! Regarding point 2, I agree with you. However, the default return of |
So, I resolved the other issues. Still not sure about how to resolve the last issue (see previous comment). Why is even the default return of |
I have no idea. Like I said before, it makes most sense to me that what is returned from seed() can be passed back into it. @jkterry1 Is this something that can be changed or is it something that we just have to deal with? |
@RaghuSpaceRajan after discussion I think the least bad option is to do a one time breaking change here such that seed always returns the same datastructure. |
@jkterry1 thanks for the input! |
@RaghuSpaceRajan I think so, but just to confirm the dtype of the output of seed here is the same as the input right? |
I think then that there has been a misundertanding. @jkterry1, the output of gym/gym/spaces/tests/test_spaces.py Line 224 in 89c8bfb
We would need to change |
In this case I'm going to merge this and create a separate issue to figure out what seed should return in the future |
def seed(self, seed=None): | ||
seed = super().seed(seed) | ||
try: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=False, # unique subseed for each subspace | ||
) | ||
except ValueError: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=True, # we get more than INT_MAX subspaces | ||
) | ||
|
||
for subspace, subseed in zip(self.spaces.values(), subseeds): | ||
seed.append(subspace.seed(int(subseed))[0]) | ||
|
||
return seed | ||
seeds = [] | ||
if isinstance(seed, dict): | ||
for key, seed_key in zip(self.spaces, seed): | ||
assert key == seed_key, print( | ||
"Key value", | ||
seed_key, | ||
"in passed seed dict did not match key value", | ||
key, | ||
"in spaces Dict.", | ||
) | ||
seeds += self.spaces[key].seed(seed[seed_key]) | ||
elif isinstance(seed, int): | ||
seeds = super().seed(seed) | ||
try: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=False, # unique subseed for each subspace | ||
) | ||
except ValueError: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=True, # we get more than INT_MAX subspaces | ||
) | ||
|
||
for subspace, subseed in zip(self.spaces.values(), subseeds): | ||
seeds.append(subspace.seed(int(subseed))[0]) | ||
elif seed is None: | ||
for space in self.spaces.values(): | ||
seeds += space.seed(seed) | ||
else: | ||
raise TypeError("Passed seed not of an expected type: dict or int or None") | ||
|
||
return seeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Should we call
super().seed(seed)
to seedself.np_random
as well whenseed
isdict
orNone
here? -
We could merge the last two cases (
seed
isint
orNone
) into one. After statementseeds = super().seed(seed)
,self.np_random
become seeded and variableseeds
is a list of integers. -
Should we only add the main seed of subspace (
subspace.seed()[0]
) to the return value rather than extend all items insubspace.seed()
(the opseeds +=
in the first and the last cases)?
If the subpace ofDict
here is another compound space (Tuple
orDict
), the length of the return list could be differentlen(d.seed(None)) != len(d.seed(0))
. (However, I think we probably only use the first item, and hardly ever use the rest in the list and the length of the list. This issue probably never affects normal users.)
gym/gym/spaces/tests/test_spaces.py
Lines 270 to 277 in d35d211
def test_seed_returns_list(space): def assert_integer_list(seed): assert isinstance(seed, list) assert len(seed) >= 1 assert all([isinstance(s, int) for s in seed]) assert_integer_list(space.seed(None)) assert_integer_list(space.seed(0))
We could add a new testassert len(space.seed(None)) == len(space.seed(0))
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'd say it causes minimal harm and could be done. Doing this would create the
_np_random
PRNG of the "base" class's object and make it consistent with other spaces in case allSpace
objects are expected to have an_np_random
that is notNone
, although I assume that only_np_random
s of the sub-classes are ever used. - It's subjective. For the case, when it's
None
, I assume the user would want the seed of every sub-space to be as random as possible in the sense that they would want all of them to be seeded withNone
. But generating the seeds for sub-spaces using an_np_random
PRNG generated based onNone
(i.e., the suggested way) should also be fine. - I don't think that there's an expected fixed length of the list of seeds returned, so I do not think it makes a difference either way. In any case, the length of the list of seeds returned can always be calculated if the sub-spaces are known, in case a user really wants dig deeper.
So, in general, I'm fine with any of the suggestions either being or not being implemented (also for the suggestions on Tuple
below). Please let me know.
def seed(self, seed=None): | ||
seed = super().seed(seed) | ||
try: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=False, # unique subseed for each subspace | ||
) | ||
except ValueError: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=True, # we get more than INT_MAX subspaces | ||
) | ||
seeds = [] | ||
|
||
if isinstance(seed, list): | ||
for i, space in enumerate(self.spaces): | ||
seeds += space.seed(seed[i]) | ||
elif isinstance(seed, int): | ||
seeds = super().seed(seed) | ||
try: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=False, # unique subseed for each subspace | ||
) | ||
except ValueError: | ||
subseeds = self.np_random.choice( | ||
np.iinfo(int).max, | ||
size=len(self.spaces), | ||
replace=True, # we get more than INT_MAX subspaces | ||
) | ||
|
||
for subspace, subseed in zip(self.spaces, subseeds): | ||
seed.append(subspace.seed(int(subseed))[0]) | ||
for subspace, subseed in zip(self.spaces, subseeds): | ||
seeds.append(subspace.seed(int(subseed))[0]) | ||
elif seed is None: | ||
for space in self.spaces: | ||
seeds += space.seed(seed) | ||
else: | ||
raise TypeError("Passed seed not of an expected type: list or int or None") | ||
|
||
return seed | ||
return seeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here for Tuple
(see Dict
above).
- Call
super().seed(seed)
. - Merge cases
seed
isint
andseed
isNone
. - Only add the main seed of the subspace to the return value.
Since seed() is being called in default initialization of Space, it should be controllable for reproducibility.