-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gym notebook #54
Gym notebook #54
Conversation
Still missing some explanation about algorithms used to solve the problem and about the problem itself. Help for that would be greatly appreciated ! |
I introduced IW solver as suggested by @fteicht , but it seems not working even though i tried to reproduced what was done in full_multisolve.py |
This notebook needs ffmpeg to work as an additional dependency. It already exists on colab and i added it on binder but i can only state at the beginning of the notebook that local jupyter users need to install it as it is platform dependent. |
@nhuet binder seems to be super slow. I have tried against your link above, il has run for 5mn closing with this error:
Any idea what could be the problem ? |
@nhuet can you briefly describe the issue with IW? |
It can happen sometimes, have you tried to relaunch? |
The domain being close at each render (because of
|
@neo-alex : more importantly, i included a text telling that PPO is doing well in this context but actually with its current settings, this is the only solver that does not work ! (even though it is a RL one) i have total_timesteps=50000, what else can i tune to make it work? (all of this is random and i did not fix the seed for now, but PPO was never working during my tests,while CGP and IW were always working) |
NB: using our own rollout avoid having to use monitor and ffmpeg, and fix display issue with IW. Actually we are back to a "real time " rendering, no more movie prepared before seeing something. But this is not so smooth on binder (compared to local jupyter session). Still working though |
a57a765
to
1bfc4db
Compare
References and explanation have been added as necessary (still to be reviewed), except for one thing: explaining why we need to declare a different gym wrapper class for IW (aka |
Thanks @nhuet !
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nhuet !
Some comments:
- The description of PPO as optimizing a surrogate objective function is not very clear to me. I would say that PPO directly optimizes the weights of the policy network using stochastic gradient ascent.
- Link to utils.py: I think you can directly link to the rollout function in the script
- when you say "Some RL algorithms intrinsically favouring exploration like SAC (Soft Actor-Critic) might still work", do they work or not? If they don't systematically work, or if they work only sporadically, we should clearly say so. Otherwise, people might wonder why don't show a RL algorithm which is supposed to work.
- CGP is not working better than PPO on my binder instance! So big warning here. It seems that sometimes it does not work (how much time? Were I unlucky? But if it happens, we have to warn the reader at least.)
- "Perhaps more explanation about what we are doing here and why it is required ?" => The IW solver needs domain characteristics (in fact methods) inherited from GymPlanningDomain, GymDiscreteActionDomain and GymWidthDomain. In addition, we must provide the state_features method which is required by IW. There is no such domain provided by default in scikit-decide, also because it would target only the IW algorithm, which is why we need to define the GymDomainForWidthSolvers domain. We can think later of potentially providing this domain by default in scikit-decide if we think it is useful.
- when you call the IW solver constructor, you can mention that bee2_features is the state features we use in Gym environment for IW to dynamically increase state variable intervals. In other domains, other state features might be more suitable.
- "IW algorithm was able to find an efficient solution in less than 1 second": it depends on the computer...
The problem with that kind of link is that it is pointing
That's why i chose not to point to the function but rather to the script as i did not find a way to do that without freezing the code version as you did. |
It was a comment from @neo-alex who succeeded to make it work. However i could not succeed to do it, that's why i did not show it myself. Maybe we can just skip the sentence. |
I removed reference to explicit time. But actually the computation was instantaneous on my laptop and binder, so i felt quite safe telling "less than 1 second". Still i replaced it by "in a very short time". |
Indeed! Also for me with binder oops. By doubling the n_it it seems to work though and i added a warning as suggested. (on my laptop it never failed me, strange...) |
Actually, it depends on episodes. So perhaps it is better to launch several episodes, instead of increasing the number of iterations during solve. |
@fteicht : i think i took into account all your comments. Is it ok to accept the PR ? |
@nhuet Thanks, the PR is fine to me, I am accepting it. |
…CGP and StableBaseline
show rollout as movies
+ split rollout from solve + using own rollout avoid the need for monitor wrapper and ffmpeg + using own rollout fix the display issue for IW (the opengl window stil pops up/closes on local sessions though)
- check goal reached with position - modification for iw as observation is wrapped in a GymDomainStateProxy
- modify summary of PPO - skip mentioning SAC as it is not also working (at least on my laptop) - add a warning that cgp can fail - add explanation about new wrapping class necessary for IW - use qualitative rather than quantitative computation time for IW in conclusion
(removed also from class definition as it is specified directly when instanciating IW solver)
in order to see that goal is not always reached (or not reached)
Notebook presenting how to define a gym environment based domain and solve it with scikit-decide.