Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gym notebook #54

Merged
merged 23 commits into from
Oct 22, 2021
Merged

Gym notebook #54

merged 23 commits into from
Oct 22, 2021

Conversation

nhuet
Copy link
Contributor

@nhuet nhuet commented Sep 7, 2021

Notebook presenting how to define a gym environment based domain and solve it with scikit-decide.

@nhuet
Copy link
Contributor Author

nhuet commented Sep 7, 2021

Still missing some explanation about algorithms used to solve the problem and about the problem itself. Help for that would be greatly appreciated !

@nhuet
Copy link
Contributor Author

nhuet commented Sep 7, 2021

I introduced IW solver as suggested by @fteicht , but it seems not working even though i tried to reproduced what was done in full_multisolve.py

@nhuet
Copy link
Contributor Author

nhuet commented Sep 7, 2021

This notebook needs ffmpeg to work as an additional dependency. It already exists on colab and i added it on binder but i can only state at the beginning of the notebook that local jupyter users need to install it as it is platform dependent.

@nhuet
Copy link
Contributor Author

nhuet commented Sep 7, 2021

Here are the link to test it on binder Binder

@galleon
Copy link
Contributor

galleon commented Sep 13, 2021

@nhuet binder seems to be super slow. I have tried against your link above, il has run for 5mn closing with this error:

Found built image, launching...
Launching server...
Failed to connect to event stream

Any idea what could be the problem ?

@fteicht
Copy link
Collaborator

fteicht commented Sep 13, 2021

@nhuet can you briefly describe the issue with IW?

@nhuet
Copy link
Contributor Author

nhuet commented Sep 13, 2021

@nhuet binder seems to be super slow. I have tried against your link above, il has run for 5mn closing with this error:

Found built image, launching...
Launching server...
Failed to connect to event stream

Any idea what could be the problem ?

It can happen sometimes, have you tried to relaunch?

@nhuet
Copy link
Contributor Author

nhuet commented Sep 13, 2021

@nhuet can you briefly describe the issue with IW?

The domain being close at each render (because of DeterministicGymDomain._render_from()), the monitor wrapper used to get a movie of the rollout capture only one frame). Thus

  • in colab and binder, we only see a movie with 1 image and during 0s after rollout
  • moreover in local jupyter, the opengl window keeps opening/closing during rollout

@nhuet
Copy link
Contributor Author

nhuet commented Oct 4, 2021

@neo-alex and @fteicht : i inserted your explanations texts. So the notebook is ready for a first review. Some texts are probably still missing (i put some placeholder to show where)

@nhuet
Copy link
Contributor Author

nhuet commented Oct 4, 2021

@neo-alex : more importantly, i included a text telling that PPO is doing well in this context but actually with its current settings, this is the only solver that does not work ! (even though it is a RL one) i have total_timesteps=50000, what else can i tune to make it work? (all of this is random and i did not fix the seed for now, but PPO was never working during my tests,while CGP and IW were always working)

@nhuet nhuet marked this pull request as ready for review October 4, 2021 13:32
@nhuet
Copy link
Contributor Author

nhuet commented Oct 4, 2021

NB: using our own rollout avoid having to use monitor and ffmpeg, and fix display issue with IW. Actually we are back to a "real time " rendering, no more movie prepared before seeing something. But this is not so smooth on binder (compared to local jupyter session). Still working though

@nhuet nhuet force-pushed the gym_nb branch 2 times, most recently from a57a765 to 1bfc4db Compare October 7, 2021 13:16
@nhuet
Copy link
Contributor Author

nhuet commented Oct 18, 2021

References and explanation have been added as necessary (still to be reviewed), except for one thing: explaining why we need to declare a different gym wrapper class for IW (aka GymDomainForWidthSolvers) and why this is not already existing in the library. And how state_features() method is implemented. So that readers can see how to do that for their own usecases.

@fteicht
Copy link
Collaborator

fteicht commented Oct 19, 2021

Thanks @nhuet !
Some comments:

  • The description of PPO as optimizing a surrogate objective function is not very clear to me. I would say that PPO directly optimizes the weights of the policy network using stochastic gradient ascent.
  • Link to utils.py: I think you can directly link to the rollout function in the script
  • when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.
  • CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)
  • "Perhaps more explanation about what we are doing here and why it is required ?" => The IW solver needs domain characteristics (in fact methods) inherited from GymPlanningDomain, GymDiscreteActionDomain and GymWidthDomain. In addition, we must provide the state_features method which is required by IW. There is no such domain provided by default in scikit-decide, also because it would target only the IW algorithm, which is why we need to define the GymDomainForWidthSolvers domain. We can think later of potentially providing this domain by default in scikit-decide if we think it is useful.
  • when you call the IW solver constructor, you can mention that bee2_features is the state features we use in Gym environment for IW to dynamically increase state variable intervals. In other domains, other state features might be more suitable.
  • "IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...

@fteicht fteicht self-requested a review October 20, 2021 13:21
Copy link
Collaborator

@fteicht fteicht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nhuet !
Some comments:

  • The description of PPO as optimizing a surrogate objective function is not very clear to me. I would say that PPO directly optimizes the weights of the policy network using stochastic gradient ascent.
  • Link to utils.py: I think you can directly link to the rollout function in the script
  • when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.
  • CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)
  • "Perhaps more explanation about what we are doing here and why it is required ?" => The IW solver needs domain characteristics (in fact methods) inherited from GymPlanningDomain, GymDiscreteActionDomain and GymWidthDomain. In addition, we must provide the state_features method which is required by IW. There is no such domain provided by default in scikit-decide, also because it would target only the IW algorithm, which is why we need to define the GymDomainForWidthSolvers domain. We can think later of potentially providing this domain by default in scikit-decide if we think it is useful.
  • when you call the IW solver constructor, you can mention that bee2_features is the state features we use in Gym environment for IW to dynamically increase state variable intervals. In other domains, other state features might be more suitable.
  • "IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...

@nhuet
Copy link
Contributor Author

nhuet commented Oct 21, 2021

  • Link to utils.py: I think you can directly link to the rollout function in the script

The problem with that kind of link is that it is pointing

  • to a specific commit (f469c80) -> i would rather point to master branch as the notebook is likely not to change very often
  • to a given line number (here 89). And the link will not work properly anymore if the file changes even though the function rollout still exists. (once we are using a link to master branch rather than to a commit)

That's why i chose not to point to the function but rather to the script as i did not find a way to do that without freezing the code version as you did.

@nhuet
Copy link
Contributor Author

nhuet commented Oct 21, 2021

  • when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.

It was a comment from @neo-alex who succeeded to make it work. However i could not succeed to do it, that's why i did not show it myself. Maybe we can just skip the sentence.

@nhuet
Copy link
Contributor Author

nhuet commented Oct 21, 2021

  • "IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...

I removed reference to explicit time. But actually the computation was instantaneous on my laptop and binder, so i felt quite safe telling "less than 1 second". Still i replaced it by "in a very short time".

@nhuet
Copy link
Contributor Author

nhuet commented Oct 21, 2021

  • CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)

Indeed! Also for me with binder oops. By doubling the n_it it seems to work though and i added a warning as suggested. (on my laptop it never failed me, strange...)

@nhuet
Copy link
Contributor Author

nhuet commented Oct 21, 2021

Actually, it depends on episodes. So perhaps it is better to launch several episodes, instead of increasing the number of iterations during solve.

@nhuet
Copy link
Contributor Author

nhuet commented Oct 21, 2021

@fteicht : i think i took into account all your comments. Is it ok to accept the PR ?

@fteicht
Copy link
Collaborator

fteicht commented Oct 21, 2021

@nhuet Thanks, the PR is fine to me, I am accepting it.

@fteicht fteicht self-requested a review October 21, 2021 15:43
nhuet added 19 commits October 22, 2021 09:37
+  split rollout from solve
+ using own rollout avoid the need for monitor wrapper and ffmpeg
+ using own rollout fix the display issue for IW (the opengl window stil
  pops up/closes on local sessions though)
- check goal reached with position
- modification for iw as observation is wrapped in a GymDomainStateProxy
- modify summary of PPO
- skip mentioning SAC as it is not also working (at least on my laptop)
- add a warning that cgp can fail
- add explanation about new wrapping class necessary for IW
- use qualitative rather than quantitative computation time for IW in
  conclusion
(removed also from class definition as it is specified directly when
instanciating IW solver)
in order to see that goal is not always reached (or not reached)
@fteicht fteicht merged commit ba2d70b into airbus:master Oct 22, 2021
@nhuet nhuet deleted the gym_nb branch November 26, 2021 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants