Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable #1774

RaghuSpaceRajan · 2019-12-31T15:00:04Z

Since seed() is being called in default initialization of Space, it should be controllable for reproducibility.

…initialization.

pzhokhov · 2020-02-10T10:34:31Z

I think the reproducibility of space.sample() can be achieved without this API change. Namely, because the constructors do not draw from the random number generator, you can do something like

space = spaces.Discrete(10) # using Discrete as an example, any subclass of space would work here
space.seed(123)
sample = space.sample() # always returns the same sample

The call to seed() in the Space constructor is merely so that user won't have to call space.seed() if reproducibility is not required (otherwise, self.np_random would be None, and sampling will fail).

RaghuSpaceRajan · 2020-02-10T20:05:26Z

I'm sorry I should have mentioned that I know that the seed for the space can be controlled. However, if you want to subclass the Space environment and do something in the __init__() function which depends on the seed, then you have to have the ugly workaround of instantiating an object of the subclass first and then calling seed() with a known seed and then having another secondary init() function which does the part of the initialisation that was dependent on the seed.

Additionally, it does waste some computation, however small that might be, to call seed() without an argument and then re-call it with an argument when reproducibility is required (which should be the case in the majority of use cases because reproducibility is such a key issue in RL).

RaghuSpaceRajan · 2020-02-10T20:08:18Z

On a related note, exposing the seed to be controllable doesn't break any of the existing code since we pass in a default of None, but we might also want to consider passing in a default value such as 0 for the seed, so that in case, someone ran an experiment without paying heed to the seed, they can at least relaunch it, in case it turned up something interesting, especially if the experiment was expensive.

RaghuSpaceRajan · 2020-02-19T14:52:00Z

I just realised that the seed dependent initialisation part could be done within seed() itself, or in reset() possibly (see discussion below), since these functions are always called for the environment before using it. I was taking the meanings of __init__() and seed() too literally and thus wanted to initialise only inside __init__() and only set seed within seed(). But that might be a matter of taste. So, I'm closing this issue.

The other 2 minor points I discussed (about wasting a small amount of computation and setting a default seed of, say, 0) still remain but I suppose they're not major enough for me and if you like we can re-open the issue and discuss those.

Another important point for anyone reading this discussion would be that, I think, we should not do seed dependent initialisation in reset() because that would probably lead to the same initial state every time the environment is reset and this would be bad for learning algorithms because we could have the same training trajectories for different episodes.

RaghuSpaceRajan · 2020-05-19T21:04:22Z

Hey, sorry, but I have to reopen this. We have a library which we want to release publicly: MDP Playground. We initialise the seed here. And after that we do a lot of init for the environment since we generate the transition and reward function for the environment randomly, for instance here. As I said, in my last comment, this part could be moved to seed(). But the initialisation of the environment components is not meant to be there. It should be in __init__(). Initially, I thought it would be okay to maintain a separate fork here and recommend users to use that. But I now realise that that's leading to conflicts if I keep the package name the same as gym. If I change the package name, then that leads to problems using my fork because some RL frameworks require gym compatibility.

Another reason to reopen is the new commit for controlling seeds of individual spaces within a Tuple. I'm creating Tuple environments which hold other Discrete environments (possibly of the same size) and each of those environments is required to have different dynamics, which is not possible if they are initialised with the same seed, as is done currently in seed().

This pull request is backwards compatible and improves the API for various reasons I have outlined in my previous comments and doesn't cause any failures for existing tests. Could you please accept it?

pzhokhov · 2020-08-14T21:13:29Z

Hi @RaghuSpaceRajan ! Sorry about the delay! I like the change about initializing Tuple space seed with an iterable (probably makes sense to add mirror version for Dict space - initialize it with a dict)
Still very confused about the need to include seed into Space constructor though. Specifically, I am proposing to replace this line: https://github.com/anonips/-MDP-Playground/blob/74431f98c210830a93a1bc83fcdcb95bf1644696/rl_toy/envs/rl_toy_env.py#L142
(which reads:

self.observation_space = DiscreteExtended(config["state_space_size"], seed=config["state_space_size"] + config["seed"]) #seed #hack #TODO Gym (and so Ray) apparently needs "observation"_space as a member. I'd prefer "state"_space

with

self.observation_space = Discrete(config["state_space_size"])
self.observation_space.seed(config["state_space_size"] + config["seed"])

think that should have the same result?
Similarly, all other spaces can be seeded immediately after assignment.

RaghuSpaceRajan · 2020-08-19T21:50:14Z

Hi @pzhokhov , thanks for the reply! That's a good idea for the Dict space. I'll add it.

Regarding the change you suggested for seeding Space subclasses immediately after init, doesn't it waste computation to call seed() twice? It is already called in Space.__init__(). It is admittedly, a small computation in most cases.

…t seeding

RaghuSpaceRajan · 2020-08-27T18:33:09Z

Hi @pzhokhov I added the seeding for Dict as well as some test cases.

I see that this pull request makes the argument in my last comment about seeding twice moot and now we have to call seed() after the init in case we want an explicit seed. I still feel though that it's good to at least store the int, that is going to be used as the seed later, explicitly. This would help if someone wants to do some seed dependent initialisation for a Space within its init method. What do you think?

P.S.: I see that I confused Space and Env in one of my comments above. Sorry about that. I struck through that comment part.

jkterry1 · 2021-07-31T15:24:57Z

Reviewer: @Bam4d

@RaghuSpaceRajan could you please fix the merge conflict if you see this?

Bam4d · 2021-08-04T08:06:56Z

@jkterry1 @RaghuSpaceRajan It looks to me like this is a good ticket to merge, but not without a few changes. I agree that the Tuple and Dict space seed function should iteratively seed sub-spaces for reproducibility, but having the seed in __init__ is superflous and should be removed from this PR.

Bam4d

remove the seeds from the init functions, but keep the recursive seeding for dict and tuple spaces

jkterry1 · 2021-08-05T14:42:56Z

I'm going to close this PR as it appears to have gone stale and I created an accompanying issue for adding these changes going forward. If anyone would be willing to create a new updated PR thats based off master and includes the fixes bam4d mentioned, I would greatly appreciate it.

RaghuSpaceRajan · 2021-08-05T18:00:15Z

Hey, @jkterry1 , could you please re-open the issue? I've been sick lately, sorry. I'll get to it soon. The conflicts had looked fixable when I glanced through them.

I did have a question about seed in __init__ that wasn't answered above. Having the seed in __init__ would help if someone would want to do some seed dependent initialisation for a Space within its init method. What do you think about this?

jkterry1 · 2021-08-10T19:47:16Z

@Bam4d can you answer the question about the init method when you get a chance?

Bam4d · 2021-08-11T18:40:24Z

I guess its a bit cleaner if the seed is in the __init__ as it reduces the function calls. I have no problem with this, I'm happy for it to be merged.

benblack769 · 2021-08-12T13:47:51Z

While the PR generally looks good, I think it needs better support for passing in a plain integer as a seed. I don't like the added complexity of having to pass in lots of different seeds for a single space. This sounds unnecessary to me. A better, simpler, more general solution to your problem that is easier for RL frameworks to handle, and API backwards compatible is to simply give different seeds to each sub-space by adding constants to the base seed.

For tuples, this is simple, you can use the same seeding strategy as is used in vector environments: https://github.com/openai/gym/blob/master/gym/vector/sync_vector_env.py#L54

def seed(self, seeds=None):
	if seeds is None:
		seeds = [None for _ in range(len(self.spaces))]
	if isinstance(seeds, int):
		seeds = [seeds + i for i in range(len(self.spaces))]
	assert len(seeds) == self.num_envs

	for space, seed in zip(self.spaces, seeds):
		space.seed(seed)

For dicts, things get a bit more subtle. If the key is orderable, you can just sort the list of items and do the same thing as for tuples. If the key is not orderable, you might be able to get a sufficiently good solution by adding the hash of the seed (of course, the hash is well defined otherwise the dict would not work).

def seed(self, seeds=None):
	if isinstance(seeds, int):
		seeds = {key: seed + hash(key) for key, space in self.spaces.items()}
	...

@Bam4d @RaghuSpaceRajan
Do you think this strategy is good?

RaghuSpaceRajan · 2021-08-12T15:07:46Z

@benblack769 , the changes I made are already backwards compatible. If one doesn't pass a list of seeds for Tuple or a dict of seeds for Dict, the old procedure is used to seed.

Additionally, I'm no authority on this but I'm aware that linearly increasing seeds may not be the best idea (https://stackoverflow.com/a/1554995/11063709). It may/may not be a problem for RL envs, I can't say for sure.

@Bam4d Thanks for the input! I'll try to finish the PR today.

RaghuSpaceRajan · 2021-08-12T16:18:20Z

Hi @Bam4d , I made the remaining changes and the tests are also passing. Please let me know if you think something's still missing.

benblack769 · 2021-08-16T23:26:04Z

So the Gym developers were aware of the linearly increasing seed problem, and use a crytographic grade hash on the seed before putting it into numpy:
https://github.com/openai/gym/blob/master/gym/utils/seeding.py#L45
https://github.com/openai/gym/blob/master/gym/spaces/space.py#L43

benblack769 · 2021-08-16T23:28:20Z

So this is not a problem. I think that the plan I laid out above is simple and much better than the current state of things.

RaghuSpaceRajan · 2021-08-18T15:42:08Z

@benblack769, thanks for the links, good to know!

So, the difference in the current approach and what you propose is only when an int is passed, right?

And you propose that when an int is passed, even then different seeds should be used for seeding the sub-spaces.

I think there are advantages and disadvantages for the proposed approach:
Advantage: Easy to pass an int to let a vector (list/dict) of values be used. The vectors you suggested above are good defaults.
Disadvantage: Less intuitive that an int is "broadcast" to a vector of values non-trivially. Someone might expect the same int to be used as seed for all sub-spaces when passing a single int.

I personally am in favour of the current approach because it feels more intuitive to me that passing an int uses the same int for all sub-spaces, and because it takes just an extra line of code to go from the int to the vector of seed values that you suggested in your code.

Having said that, I'm fine with the new approach you proposed as well in case a majority is in favour of it. Maybe, @Bam4d can break the tie.

jkterry1 · 2021-08-26T20:00:36Z

@RaghuSpaceRajan please do it Ben's way

RaghuSpaceRajan · 2021-09-06T15:18:55Z

@jkterry1 @Bam4d @benblack769 #2365 has created merge conflicts with this PR.

#2365 tries to fix the sub-case of the issue here, when the seed passed would be an int. This conflicts with the way @benblack769 suggested to do it above - #2365 suggests randomly generating multiple seeds based on the passed seed while @benblack769 suggested adding either hash(key) (for a Dict) or sub-space index (for a Tuple) to the seed. So, which of these 2 methods should I use?

Additional info: #2365 also fixed the bug that seed() did not return a list for Dict and Tuple. I assume that's compatible with this issue. We, may, however want seed() to return a dict in case of Dict. What do you think?

benblack769 · 2021-09-06T18:22:40Z

Generating random seeds to populate the elements of a tuple env should be fine. I see no problem with that. For a dict env, I don't like it because dictionary order isn't a reliable property of a dictionary, only the set of keys is. Unfortunately, I've learned that hashing is also not reliable. So perhaps the order is the best that we can do, and the current implementation is fine.
I think it would be best for seed() to return whatever the seed method accepts as its argument. If it accepts a dict, then return a dict.

RaghuSpaceRajan · 2021-09-06T20:10:06Z

@benblack769 , thanks for the input!

Regarding point 2, I agree with you. However, the default return of seed() for Space (and so I assume for all spaces) is a list, even when an int is passed. So, do you have any thoughts on how we could make it consistent for all classes?

RaghuSpaceRajan · 2021-09-07T23:29:32Z

So, I resolved the other issues. Still not sure about how to resolve the last issue (see previous comment). Why is even the default return of seed() for Space a list and not an int?

benblack769 · 2021-09-09T13:30:56Z

I have no idea. Like I said before, it makes most sense to me that what is returned from seed() can be passed back into it. @jkterry1 Is this something that can be changed or is it something that we just have to deal with?

jkterry1 · 2021-09-11T21:40:21Z

@RaghuSpaceRajan after discussion I think the least bad option is to do a one time breaking change here such that seed always returns the same datastructure.

RaghuSpaceRajan · 2021-09-13T00:08:46Z

@jkterry1 thanks for the input! seed() currently always returns a list of ints so I presume I do not need to change anything else now. Did I get that right?

jkterry1 · 2021-09-13T00:48:15Z

@RaghuSpaceRajan I think so, but just to confirm the dtype of the output of seed here is the same as the input right?

RaghuSpaceRajan · 2021-09-13T17:07:57Z

I think then that there has been a misundertanding. @jkterry1, the output of seed() is currently not the same for (almost) any of the classes. The default return of seed() for all spaces is a list, even when an int is passed (that's how it is tested here:

gym/gym/spaces/tests/test_spaces.py

Line 224 in 89c8bfb

def test_seed_returns_list(space):

).

We would need to change Space to change the default return of seed() to an int so that it matches its argument. I could do that but I do not know if that might have any cascading effects beyond just failing the above test, e.g., someone using Gym might have written code that depends on the return of seed() being a list.

jkterry1 · 2021-09-13T18:07:28Z

In this case I'm going to merge this and create a separate issue to figure out what seed should return in the future

XuehaiPan · 2021-09-19T12:28:48Z

gym/spaces/dict.py

    def seed(self, seed=None):
-        seed = super().seed(seed)
-        try:
-            subseeds = self.np_random.choice(
-                np.iinfo(int).max,
-                size=len(self.spaces),
-                replace=False,  # unique subseed for each subspace
-            )
-        except ValueError:
-            subseeds = self.np_random.choice(
-                np.iinfo(int).max,
-                size=len(self.spaces),
-                replace=True,  # we get more than INT_MAX subspaces
-            )
-
-        for subspace, subseed in zip(self.spaces.values(), subseeds):
-            seed.append(subspace.seed(int(subseed))[0])
-
-        return seed
+        seeds = []
+        if isinstance(seed, dict):
+            for key, seed_key in zip(self.spaces, seed):
+                assert key == seed_key, print(
+                    "Key value",
+                    seed_key,
+                    "in passed seed dict did not match key value",
+                    key,
+                    "in spaces Dict.",
+                )
+                seeds += self.spaces[key].seed(seed[seed_key])
+        elif isinstance(seed, int):
+            seeds = super().seed(seed)
+            try:
+                subseeds = self.np_random.choice(
+                    np.iinfo(int).max,
+                    size=len(self.spaces),
+                    replace=False,  # unique subseed for each subspace
+                )
+            except ValueError:
+                subseeds = self.np_random.choice(
+                    np.iinfo(int).max,
+                    size=len(self.spaces),
+                    replace=True,  # we get more than INT_MAX subspaces
+                )
+
+            for subspace, subseed in zip(self.spaces.values(), subseeds):
+                seeds.append(subspace.seed(int(subseed))[0])
+        elif seed is None:
+            for space in self.spaces.values():
+                seeds += space.seed(seed)
+        else:
+            raise TypeError("Passed seed not of an expected type: dict or int or None")
+
+        return seeds


Should we call super().seed(seed) to seed self.np_random as well when seed is dict or None here?

We could merge the last two cases (seed is int or None) into one. After statement seeds = super().seed(seed), self.np_random become seeded and variable seeds is a list of integers.

Should we only add the main seed of subspace (subspace.seed()[0]) to the return value rather than extend all items in subspace.seed() (the op seeds += in the first and the last cases)?
If the subpace of Dict here is another compound space (Tuple or Dict), the length of the return list could be different len(d.seed(None)) != len(d.seed(0)). (However, I think we probably only use the first item, and hardly ever use the rest in the list and the length of the list. This issue probably never affects normal users.)

gym/gym/spaces/tests/test_spaces.py

Lines 270 to 277 in d35d211

def test_seed_returns_list(space):

def assert_integer_list(seed):

assert isinstance(seed, list)

assert len(seed) >= 1

assert all([isinstance(s, int) for s in seed])

assert_integer_list(space.seed(None))

assert_integer_list(space.seed(0))

We could add a new test assert len(space.seed(None)) == len(space.seed(0)) here.

I'd say it causes minimal harm and could be done. Doing this would create the _np_random PRNG of the "base" class's object and make it consistent with other spaces in case all Space objects are expected to have an _np_random that is not None, although I assume that only _np_randoms of the sub-classes are ever used.

It's subjective. For the case, when it's None, I assume the user would want the seed of every sub-space to be as random as possible in the sense that they would want all of them to be seeded with None. But generating the seeds for sub-spaces using an _np_random PRNG generated based on None (i.e., the suggested way) should also be fine.

I don't think that there's an expected fixed length of the list of seeds returned, so I do not think it makes a difference either way. In any case, the length of the list of seeds returned can always be calculated if the sub-spaces are known, in case a user really wants dig deeper.

So, in general, I'm fine with any of the suggestions either being or not being implemented (also for the suggestions on Tuple below). Please let me know.

XuehaiPan · 2021-09-19T12:34:04Z

gym/spaces/tuple.py

    def seed(self, seed=None):
-        seed = super().seed(seed)
-        try:
-            subseeds = self.np_random.choice(
-                np.iinfo(int).max,
-                size=len(self.spaces),
-                replace=False,  # unique subseed for each subspace
-            )
-        except ValueError:
-            subseeds = self.np_random.choice(
-                np.iinfo(int).max,
-                size=len(self.spaces),
-                replace=True,  # we get more than INT_MAX subspaces
-            )
+        seeds = []
+
+        if isinstance(seed, list):
+            for i, space in enumerate(self.spaces):
+                seeds += space.seed(seed[i])
+        elif isinstance(seed, int):
+            seeds = super().seed(seed)
+            try:
+                subseeds = self.np_random.choice(
+                    np.iinfo(int).max,
+                    size=len(self.spaces),
+                    replace=False,  # unique subseed for each subspace
+                )
+            except ValueError:
+                subseeds = self.np_random.choice(
+                    np.iinfo(int).max,
+                    size=len(self.spaces),
+                    replace=True,  # we get more than INT_MAX subspaces
+                )

-        for subspace, subseed in zip(self.spaces, subseeds):
-            seed.append(subspace.seed(int(subseed))[0])
+            for subspace, subseed in zip(self.spaces, subseeds):
+                seeds.append(subspace.seed(int(subseed))[0])
+        elif seed is None:
+            for space in self.spaces:
+                seeds += space.seed(seed)
+        else:
+            raise TypeError("Passed seed not of an expected type: list or int or None")

-        return seed
+        return seeds


Same comment here for Tuple (see Dict above).

Call super().seed(seed).

Merge cases seed is int and seed is None.

Only add the main seed of the subspace to the return value.

RaghuSpaceRajan added 3 commits December 31, 2019 15:55

Make the seed in default initialization controllable

e003db1

Since seed() is being called in default initialization of Space, it should be controllable for reproducibility.

Updated derived classes of Space to have their seeds controllable at …

dba1186

…initialization.

Merge branch 'master' into patch-1

f8c415e

RaghuSpaceRajan closed this Feb 19, 2020

RaghuSpaceRajan added 2 commits May 19, 2020 22:28

Allow Tuple's spaces to each have their own seed

ca6165a

Merge branch 'master' into patch-1

a083e53

RaghuSpaceRajan reopened this May 19, 2020

Added dict based seeding for Dict space; test cases for Tuple and Dic…

870090f

…t seeding

jkterry1 added the Needs Reviewer label Jul 30, 2021

jkterry1 removed the Needs Reviewer label Jul 31, 2021

Bam4d suggested changes Aug 4, 2021

View reviewed changes

jkterry1 mentioned this pull request Aug 5, 2021

Make the seed in default initialization controllable (Updated PR) #2299

Closed

jkterry1 closed this Aug 5, 2021

jkterry1 reopened this Aug 5, 2021

RaghuSpaceRajan added 2 commits August 10, 2021 21:45

Merge branch 'master' into patch-1

82ade50

Update discrete.py

c3c6891

Update test_spaces.py

eea66e5

RaghuSpaceRajan and others added 3 commits August 12, 2021 17:28

Merge branch 'openai:master' into patch-1

145160b

Add seed to __init__()

1d41682

blacked

7f1ced1

RaghuSpaceRajan and others added 3 commits September 7, 2021 20:44

Merge branch 'master' into patch-1

57f1242

Fix black

5944562

Fix failing tests

6d7725d

RaghuSpaceRajan changed the title ~~Make the seed in default initialization controllable~~ Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable Sep 7, 2021

Merge branch 'master' into patch-1

a6cd3fb

jkterry1 merged commit c571b0d into openai:master Sep 13, 2021

jkterry1 mentioned this pull request Sep 13, 2021

[Question] Should seed() always return a list? #2406

Closed

XuehaiPan reviewed Sep 19, 2021

View reviewed changes

	def test_seed_returns_list(space):
	def assert_integer_list(seed):
	assert isinstance(seed, list)
	assert len(seed) >= 1
	assert all([isinstance(s, int) for s in seed])

	assert_integer_list(space.seed(None))
	assert_integer_list(space.seed(0))

Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable #1774

Make Tuple and Dicts be seedable with lists and dicts of seeds + make the seed in default initialization controllable #1774

Conversation

RaghuSpaceRajan commented Dec 31, 2019

pzhokhov commented Feb 10, 2020 • edited Loading

RaghuSpaceRajan commented Feb 10, 2020 • edited Loading

RaghuSpaceRajan commented Feb 10, 2020

RaghuSpaceRajan commented Feb 19, 2020

RaghuSpaceRajan commented May 19, 2020 • edited Loading

pzhokhov commented Aug 14, 2020 • edited Loading

RaghuSpaceRajan commented Aug 19, 2020

RaghuSpaceRajan commented Aug 27, 2020

jkterry1 commented Jul 31, 2021

Bam4d commented Aug 4, 2021

Bam4d left a comment

Choose a reason for hiding this comment

jkterry1 commented Aug 5, 2021

RaghuSpaceRajan commented Aug 5, 2021

jkterry1 commented Aug 10, 2021

Bam4d commented Aug 11, 2021

benblack769 commented Aug 12, 2021 • edited Loading

RaghuSpaceRajan commented Aug 12, 2021

RaghuSpaceRajan commented Aug 12, 2021

benblack769 commented Aug 16, 2021

benblack769 commented Aug 16, 2021

RaghuSpaceRajan commented Aug 18, 2021

jkterry1 commented Aug 26, 2021

RaghuSpaceRajan commented Sep 6, 2021

benblack769 commented Sep 6, 2021

RaghuSpaceRajan commented Sep 6, 2021

RaghuSpaceRajan commented Sep 7, 2021

benblack769 commented Sep 9, 2021

jkterry1 commented Sep 11, 2021

RaghuSpaceRajan commented Sep 13, 2021

jkterry1 commented Sep 13, 2021

RaghuSpaceRajan commented Sep 13, 2021

jkterry1 commented Sep 13, 2021

XuehaiPan Sep 19, 2021 • edited Loading

Choose a reason for hiding this comment

RaghuSpaceRajan Oct 1, 2021

Choose a reason for hiding this comment

XuehaiPan Sep 19, 2021 • edited Loading

Choose a reason for hiding this comment

pzhokhov commented Feb 10, 2020 •

edited

Loading

RaghuSpaceRajan commented Feb 10, 2020 •

edited

Loading

RaghuSpaceRajan commented May 19, 2020 •

edited

Loading

pzhokhov commented Aug 14, 2020 •

edited

Loading

benblack769 commented Aug 12, 2021 •

edited

Loading

XuehaiPan Sep 19, 2021 •

edited

Loading

XuehaiPan Sep 19, 2021 •

edited

Loading