Gym notebook #54

nhuet · 2021-09-07T14:29:04Z

Notebook presenting how to define a gym environment based domain and solve it with scikit-decide.

nhuet · 2021-09-07T14:32:32Z

Still missing some explanation about algorithms used to solve the problem and about the problem itself. Help for that would be greatly appreciated !

nhuet · 2021-09-07T14:33:33Z

I introduced IW solver as suggested by @fteicht , but it seems not working even though i tried to reproduced what was done in full_multisolve.py

nhuet · 2021-09-07T14:35:06Z

This notebook needs ffmpeg to work as an additional dependency. It already exists on colab and i added it on binder but i can only state at the beginning of the notebook that local jupyter users need to install it as it is platform dependent.

nhuet · 2021-09-07T14:37:44Z

Here are the link to test it on binder

galleon · 2021-09-13T12:30:16Z

@nhuet binder seems to be super slow. I have tried against your link above, il has run for 5mn closing with this error:

Found built image, launching...
Launching server...
Failed to connect to event stream

Any idea what could be the problem ?

fteicht · 2021-09-13T13:29:42Z

@nhuet can you briefly describe the issue with IW?

nhuet · 2021-09-13T13:47:08Z

@nhuet binder seems to be super slow. I have tried against your link above, il has run for 5mn closing with this error:
Found built image, launching...
Launching server...
Failed to connect to event stream
Any idea what could be the problem ?

It can happen sometimes, have you tried to relaunch?

nhuet · 2021-09-13T14:02:58Z

@nhuet can you briefly describe the issue with IW?

The domain being close at each render (because of DeterministicGymDomain._render_from()), the monitor wrapper used to get a movie of the rollout capture only one frame). Thus

in colab and binder, we only see a movie with 1 image and during 0s after rollout
moreover in local jupyter, the opengl window keeps opening/closing during rollout

nhuet · 2021-10-04T13:29:11Z

@neo-alex and @fteicht : i inserted your explanations texts. So the notebook is ready for a first review. Some texts are probably still missing (i put some placeholder to show where)

nhuet · 2021-10-04T13:32:26Z

@neo-alex : more importantly, i included a text telling that PPO is doing well in this context but actually with its current settings, this is the only solver that does not work ! (even though it is a RL one) i have total_timesteps=50000, what else can i tune to make it work? (all of this is random and i did not fix the seed for now, but PPO was never working during my tests,while CGP and IW were always working)

nhuet · 2021-10-04T14:08:19Z

NB: using our own rollout avoid having to use monitor and ffmpeg, and fix display issue with IW. Actually we are back to a "real time " rendering, no more movie prepared before seeing something. But this is not so smooth on binder (compared to local jupyter session). Still working though

nhuet · 2021-10-18T13:37:45Z

References and explanation have been added as necessary (still to be reviewed), except for one thing: explaining why we need to declare a different gym wrapper class for IW (aka GymDomainForWidthSolvers) and why this is not already existing in the library. And how state_features() method is implemented. So that readers can see how to do that for their own usecases.

fteicht · 2021-10-19T16:53:53Z

Thanks @nhuet !
Some comments:

The description of PPO as optimizing a surrogate objective function is not very clear to me. I would say that PPO directly optimizes the weights of the policy network using stochastic gradient ascent.
Link to utils.py: I think you can directly link to the rollout function in the script
when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.
CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)
"Perhaps more explanation about what we are doing here and why it is required ?" => The IW solver needs domain characteristics (in fact methods) inherited from GymPlanningDomain, GymDiscreteActionDomain and GymWidthDomain. In addition, we must provide the state_features method which is required by IW. There is no such domain provided by default in scikit-decide, also because it would target only the IW algorithm, which is why we need to define the GymDomainForWidthSolvers domain. We can think later of potentially providing this domain by default in scikit-decide if we think it is useful.
when you call the IW solver constructor, you can mention that bee2_features is the state features we use in Gym environment for IW to dynamically increase state variable intervals. In other domains, other state features might be more suitable.
"IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...

fteicht

Thanks @nhuet !
Some comments:

The description of PPO as optimizing a surrogate objective function is not very clear to me. I would say that PPO directly optimizes the weights of the policy network using stochastic gradient ascent.
Link to utils.py: I think you can directly link to the rollout function in the script
when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.
CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)
"Perhaps more explanation about what we are doing here and why it is required ?" => The IW solver needs domain characteristics (in fact methods) inherited from GymPlanningDomain, GymDiscreteActionDomain and GymWidthDomain. In addition, we must provide the state_features method which is required by IW. There is no such domain provided by default in scikit-decide, also because it would target only the IW algorithm, which is why we need to define the GymDomainForWidthSolvers domain. We can think later of potentially providing this domain by default in scikit-decide if we think it is useful.
when you call the IW solver constructor, you can mention that bee2_features is the state features we use in Gym environment for IW to dynamically increase state variable intervals. In other domains, other state features might be more suitable.
"IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...

nhuet · 2021-10-21T09:12:30Z

Link to utils.py: I think you can directly link to the rollout function in the script

The problem with that kind of link is that it is pointing

to a specific commit (f469c80) -> i would rather point to master branch as the notebook is likely not to change very often
to a given line number (here 89). And the link will not work properly anymore if the file changes even though the function rollout still exists. (once we are using a link to master branch rather than to a commit)

That's why i chose not to point to the function but rather to the script as i did not find a way to do that without freezing the code version as you did.

nhuet · 2021-10-21T09:18:33Z

when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.

It was a comment from @neo-alex who succeeded to make it work. However i could not succeed to do it, that's why i did not show it myself. Maybe we can just skip the sentence.

nhuet · 2021-10-21T13:50:04Z

"IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...

I removed reference to explicit time. But actually the computation was instantaneous on my laptop and binder, so i felt quite safe telling "less than 1 second". Still i replaced it by "in a very short time".

nhuet · 2021-10-21T14:09:52Z

CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)

Indeed! Also for me with binder oops. By doubling the n_it it seems to work though and i added a warning as suggested. (on my laptop it never failed me, strange...)

nhuet · 2021-10-21T14:13:08Z

Actually, it depends on episodes. So perhaps it is better to launch several episodes, instead of increasing the number of iterations during solve.

nhuet · 2021-10-21T15:31:23Z

@fteicht : i think i took into account all your comments. Is it ok to accept the PR ?

fteicht · 2021-10-21T15:42:33Z

@nhuet Thanks, the PR is fine to me, I am accepting it.

…CGP and StableBaseline

show rollout as movies

+ split rollout from solve + using own rollout avoid the need for monitor wrapper and ffmpeg + using own rollout fix the display issue for IW (the opengl window stil pops up/closes on local sessions though)

- check goal reached with position - modification for iw as observation is wrapped in a GymDomainStateProxy

- modify summary of PPO - skip mentioning SAC as it is not also working (at least on my laptop) - add a warning that cgp can fail - add explanation about new wrapping class necessary for IW - use qualitative rather than quantitative computation time for IW in conclusion

(removed also from class definition as it is specified directly when instanciating IW solver)

in order to see that goal is not always reached (or not reached)

nhuet force-pushed the gym_nb branch from 83c3976 to 624ac65 Compare September 30, 2021 20:40

nhuet marked this pull request as ready for review October 4, 2021 13:32

nhuet force-pushed the gym_nb branch 2 times, most recently from a57a765 to 1bfc4db Compare October 7, 2021 13:16

fteicht self-requested a review October 20, 2021 13:21

fteicht requested changes Oct 20, 2021

View reviewed changes

nhuet force-pushed the gym_nb branch from ecf9e59 to 66e6cc7 Compare October 21, 2021 07:50

nhuet force-pushed the gym_nb branch from 0cceaed to 228b418 Compare October 21, 2021 15:30

fteicht self-requested a review October 21, 2021 15:43

init gym notebook on montain continuous car, using existing examples …

973b214

…CGP and StableBaseline

nhuet added 19 commits October 22, 2021 09:37

add iw solver and display management for colab and binder

086622c

show rollout as movies

add a comment to install ffmpeg for local jupyter

c36f9e4

adding an intro

8970d4d

Add explanation texts and use own rollout function.

a749df3

+ split rollout from solve + using own rollout avoid the need for monitor wrapper and ffmpeg + using own rollout fix the display issue for IW (the opengl window stil pops up/closes on local sessions though)

clearing outputs

742784f

sanitize imports + add reference for IW

395813e

removing domain factory for movies

c88658d

adding more description for the problem

727db54

Changing notebook filename to be consistent with maze notebook

de00f25

update rollout function:

0e5ac2b

- check goal reached with position - modification for iw as observation is wrapped in a GymDomainStateProxy

max_steps = 500 to shorten faling episodes

fd37e70

remove duplicate of last image when rolling out

1dcc4eb

Add references and explanations.

b327a6e

adding comments on RL algo with badly-shaped rewards

5c9d587

add insight on cgp well behaviour

2a1682e

adding explanation for IW (still not satisfactory to my taste)

2114a3e

add explanation about bee2_features.

7e05b93

(removed also from class definition as it is specified directly when instanciating IW solver)

rollout several episodes for CGP

09b5efb

in order to see that goal is not always reached (or not reached)

nhuet force-pushed the gym_nb branch from 228b418 to 09b5efb Compare October 22, 2021 07:37

nhuet added 3 commits October 22, 2021 09:41

clear outputs

5140f54

removing non relevant cell

3a93a98

back to render=True for PPO

9b7c873

fteicht approved these changes Oct 22, 2021

View reviewed changes

fteicht merged commit ba2d70b into airbus:master Oct 22, 2021

nhuet deleted the gym_nb branch November 26, 2021 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gym notebook #54

Gym notebook #54

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021 •

edited

Loading

galleon commented Sep 13, 2021

fteicht commented Sep 13, 2021

nhuet commented Sep 13, 2021

nhuet commented Sep 13, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 18, 2021

fteicht commented Oct 19, 2021

fteicht left a comment •

edited

Loading

nhuet commented Oct 21, 2021 •

edited

Loading

nhuet commented Oct 21, 2021

nhuet commented Oct 21, 2021 •

edited

Loading

nhuet commented Oct 21, 2021

nhuet commented Oct 21, 2021

nhuet commented Oct 21, 2021

fteicht commented Oct 21, 2021

Gym notebook #54

Gym notebook #54

Conversation

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021

nhuet commented Sep 7, 2021 • edited Loading

galleon commented Sep 13, 2021

fteicht commented Sep 13, 2021

nhuet commented Sep 13, 2021

nhuet commented Sep 13, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 18, 2021

fteicht commented Oct 19, 2021

fteicht left a comment • edited Loading

Choose a reason for hiding this comment

nhuet commented Oct 21, 2021 • edited Loading

nhuet commented Oct 21, 2021

nhuet commented Oct 21, 2021 • edited Loading

nhuet commented Oct 21, 2021

nhuet commented Oct 21, 2021

nhuet commented Oct 21, 2021

fteicht commented Oct 21, 2021

nhuet commented Sep 7, 2021 •

edited

Loading

fteicht left a comment •

edited

Loading

nhuet commented Oct 21, 2021 •

edited

Loading

nhuet commented Oct 21, 2021 •

edited

Loading