-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maze notebook: educational version #65
Conversation
@nhuet Can you please provide a link to binder? |
7fd5c9a
to
e38c6b3
Compare
notebooks/maze_tuto.ipynb
Outdated
"\n", | ||
" # Initialize image\n", | ||
" figure = domain.render(observation)\n", | ||
" display(figure)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use display(domain.render(observation)) as earlier - same later.
notebooks/maze_tuto.ipynb
Outdated
" max_steps=max_steps,\n", | ||
" max_framerate=None,\n", | ||
" outcome_formatter=None,\n", | ||
" render=False,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did change this to True ... (assuming a user would try to do so) but the graphic is not displayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to solve and then write our own roll out ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that, (writting the rollout) but in A* (following the narratives). I can put it here if necessary.
The bad part is that we need to do that at the same time as the solve because of this "with ..." syntax that scikit-decide want to enforce when using solvers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic usage is to call rollout from inside a context manager; it is okay IMO to emulate it without a context manager, but the important point is to call solver.close()
at the end.
notebooks/maze_tuto.ipynb
Outdated
" Here Manhattan distance to goal.\n", | ||
"\n", | ||
" \"\"\"\n", | ||
" return Value(cost=sqrt((self.end.x - s.x) ** 2 + (self.end.y - s.y) ** 2))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is usual Euclidean distance, not Manhattan. BTW do we really need sqrt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Euh you are right of course. I juste copy/paste a comment from Alex without thinking further... my bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to use Manhattan distance for a Maze (and it works):
return Value(cost=abs(self.end.x - s.x) + abs(self.end.y - s.y))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neo-alex @fteicht : i think i took all commentary made into account, so it should be now your turn to review.
|
Finally I decided to simplify explanations by separating solve and rollout in 2 different cells. Still i added a comment at the end to explain how to automatically call cleanup thanks to |
Fixing #50 |
I added references for A* and PPO solvers. Some explanations are still needed (see bold italic placeholders). To simplify, I suggest to skip the part about "key concepts" and add explanations if needed in each subsection of "MazeDomain definition". I would also like explaining why the syntax to solve is "DomainClass.solve_with(solver, domain_factory)" rather than "solver.solve(domain_factory)" which would be more obvious for a new user. We have to explain the need to call the Domain Class here i think. |
b276d85
to
247ec4d
Compare
I prefixed the name with "1_" so that it will be seen as first notebook in examples list by PR #85. |
386a038
to
5047f51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this amazing notebook!
Here are a few minor comments:
- remove the '.' at the end of the first bullet sentence in the introduction text
- change "About maze problem" for "About the maze problem"
- change "bottom_right" for "bottom-right" in the paragraph titled "About maze problem"
- change "recognized as scikit-decide domain" for "recognized as a scikit-decide domain"
- change "We also specify type of states [...]" for "We also specify the types of states [...]"
- should we explain why this is a
DeterministicPlanningDomain
? (deterministic starting states and transitions, white box transition model, definition of goal states) - "Perhaps more details about why this intermediate class D is needed (autocast?)" => yes, I believe this is definitely required, I let @neo-alex comment on this.
- "Text needed to explain why methods to overload have leading underscores" => I guess @nhuet will do it otherwise please ask @neo-alex to comment
- "we will need a domain factory recreating the domain at will": we can mention that it is for instance useful for parallel solvers which create identical domains on separate processes by using the domain factory
- change "allowing to use" for "allowing us to use"
- change "that is make use of" for "that makes use of"
- change "which give access to" for either "which gives access to" or "giving access to"
- change "Here we choose Proximal [...]" for "Here we choose the Proximal [...]"
- "Explanation needed about the not intuitive syntax to use here (needed for autocast?)" => ask @neo-alex to explain
- change "Rolling out a solution" for "Rolling out the solution (found by PPO)"
- change "see utils.py module" for "see the utils.py module"
- "This is probably because it is not using the domain characteristics to their finer level.": this is actually to the fact that he reward is sparse (you get rewarded only when you reach the goal) and this is nearly impossible for this kind of RL algorithm to reach the goal just by chance without shaping the reward.
- "well known to be suited to this kind of problem": because it exploits the knowledge of the goal and of heuristic metrics to reach the goal (e.g. euclidean or Manhattan distance)
- "previously defined heuristic": I can't see where it was previously defined or presented
- change "Training solver on domain" for "Training the solver on the domain"
- change "Rolling out a solution" for "Rolling out the solution (found by Astar)"
- change "more infos" for "more information"
- change "all reward" for "all rewards"
- In the list of reasons why Astar performs well on the maze, we should mention that it exploits the knowledge of the existence of the goal (aka the space reward in this domain)
- change "from RL community" for "from the RL community"
- change "That's why" for "That is why"
- "define the domain with the finer granularity possible": and also to use the solvers that can exploit at most the known characteristics of the domain
After reading the conclusion of this notebook, I am thinking that we should maybe create another notebook in the future to highlight a problem where RL is especially better than other methods. Because at the moment we only have tutorial notebooks which seem to constantly demonstrate the opposite. On pure control problems with continuous rewards (e.g. CartPole), RL is typically better, and this not the only case. It's obviously true for problems for which we don't know the model - but in that case they won't be many other options than RL neither. |
I made all changes requested by @fteicht . It remains now only two explanations let to @neo-alex :
|
[...]
This is explained below in the "Cleaning up the solver" section, doesn't it? |
Nope. What you are talking about is the syntax involving "with ...". What I was talking about is why using MazeDomain.solve_with(solver, domain_factory) instead of solver.solve(domain_factory) The latter being more intuitive to my mind. Especially why the need to call the domain class and not the instance itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one mistake to correct please (see comment) and linting to apply, then OK for merge!
notebooks/1_maze_tuto.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"class MazeDomain(DeterministicPlanningDomain, UnrestrictedActions, Renderable):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be class MazeDomain(D)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, again a test that remained behind, my bad... Changed!
Main features: - the code is sequential (ie one solver after one is tested) - maze only related code (drawing, generation) put in a separate module - implementation of our own rollout() to show how it works - references to relevant articles for each solver + some brief explanation - use of c++ A* (rather than python one) + euclidean distance as heuristic (easier to understand thatn manhattan distance)
No description provided.