Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maze notebook: educational version #65

Merged
merged 1 commit into from
Nov 12, 2021
Merged

Conversation

nhuet
Copy link
Contributor

@nhuet nhuet commented Sep 30, 2021

No description provided.

@nhuet nhuet marked this pull request as draft September 30, 2021 08:06
@dbarbier
Copy link
Contributor

@nhuet Can you please provide a link to binder?

@nhuet nhuet force-pushed the maze_nb_seq branch 2 times, most recently from 7fd5c9a to e38c6b3 Compare September 30, 2021 10:02
@nhuet
Copy link
Contributor Author

nhuet commented Sep 30, 2021

Binder

notebooks/maze_utils.py Outdated Show resolved Hide resolved
@nhuet nhuet marked this pull request as ready for review September 30, 2021 17:03
@galleon
Copy link
Contributor

galleon commented Sep 30, 2021

Here is the link using the local (on this repo) binder env.

Binder

notebooks/maze_tuto.ipynb Outdated Show resolved Hide resolved
notebooks/maze_tuto.ipynb Outdated Show resolved Hide resolved
"\n",
" # Initialize image\n",
" figure = domain.render(observation)\n",
" display(figure)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use display(domain.render(observation)) as earlier - same later.

" max_steps=max_steps,\n",
" max_framerate=None,\n",
" outcome_formatter=None,\n",
" render=False,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did change this to True ... (assuming a user would try to do so) but the graphic is not displayed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to solve and then write our own roll out ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that, (writting the rollout) but in A* (following the narratives). I can put it here if necessary.
The bad part is that we need to do that at the same time as the solve because of this "with ..." syntax that scikit-decide want to enforce when using solvers.

Copy link
Contributor

@dbarbier dbarbier Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic usage is to call rollout from inside a context manager; it is okay IMO to emulate it without a context manager, but the important point is to call solver.close() at the end.

" Here Manhattan distance to goal.\n",
"\n",
" \"\"\"\n",
" return Value(cost=sqrt((self.end.x - s.x) ** 2 + (self.end.y - s.y) ** 2))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is usual Euclidean distance, not Manhattan. BTW do we really need sqrt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Euh you are right of course. I juste copy/paste a comment from Alex without thinking further... my bad

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to use Manhattan distance for a Maze (and it works):
return Value(cost=abs(self.end.x - s.x) + abs(self.end.y - s.y))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fteicht said to use eucliean distance because:

  • simpler to understand for everyone
  • consistent with @neo-alex tutorial

@nhuet
Copy link
Contributor Author

nhuet commented Oct 1, 2021

@neo-alex @fteicht : i think i took all commentary made into account, so it should be now your turn to review.
NB:

  • missing explanation texts are shown via placeholders like that: text needed (bold + italic)
  • the syntax using with for the solver prevent splitting solver part from rollout part -> do you think it would be a good idea to split it, even though not using your preferred syntax? (you will also see that my explanation about that syntax is quite clumsy)
  • we should also think to fix an appropriate total_timesteps for ppo so that it is still clear that is not about to work, and fast to compute at the same time

@nhuet
Copy link
Contributor Author

nhuet commented Oct 4, 2021

Finally I decided to simplify explanations by separating solve and rollout in 2 different cells. Still i added a comment at the end to explain how to automatically call cleanup thanks to with. You will see how this last commit makes the notebook much more readable (imho).

@nhuet nhuet mentioned this pull request Oct 5, 2021
@nhuet
Copy link
Contributor Author

nhuet commented Oct 7, 2021

Fixing #50

@galleon galleon mentioned this pull request Oct 8, 2021
@nhuet
Copy link
Contributor Author

nhuet commented Oct 18, 2021

I added references for A* and PPO solvers.

Some explanations are still needed (see bold italic placeholders). To simplify, I suggest to skip the part about "key concepts" and add explanations if needed in each subsection of "MazeDomain definition".

I would also like explaining why the syntax to solve is "DomainClass.solve_with(solver, domain_factory)" rather than "solver.solve(domain_factory)" which would be more obvious for a new user. We have to explain the need to call the Domain Class here i think.

@nhuet nhuet force-pushed the maze_nb_seq branch 5 times, most recently from b276d85 to 247ec4d Compare October 22, 2021 13:42
@dbarbier dbarbier mentioned this pull request Oct 25, 2021
10 tasks
@nhuet
Copy link
Contributor Author

nhuet commented Oct 25, 2021

I prefixed the name with "1_" so that it will be seen as first notebook in examples list by PR #85.
Thus, new link to launch on binder: Binder

@nhuet nhuet force-pushed the maze_nb_seq branch 2 times, most recently from 386a038 to 5047f51 Compare October 26, 2021 21:27
Copy link
Collaborator

@fteicht fteicht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this amazing notebook!
Here are a few minor comments:

  • remove the '.' at the end of the first bullet sentence in the introduction text
  • change "About maze problem" for "About the maze problem"
  • change "bottom_right" for "bottom-right" in the paragraph titled "About maze problem"
  • change "recognized as scikit-decide domain" for "recognized as a scikit-decide domain"
  • change "We also specify type of states [...]" for "We also specify the types of states [...]"
  • should we explain why this is a DeterministicPlanningDomain? (deterministic starting states and transitions, white box transition model, definition of goal states)
  • "Perhaps more details about why this intermediate class D is needed (autocast?)" => yes, I believe this is definitely required, I let @neo-alex comment on this.
  • "Text needed to explain why methods to overload have leading underscores" => I guess @nhuet will do it otherwise please ask @neo-alex to comment
  • "we will need a domain factory recreating the domain at will": we can mention that it is for instance useful for parallel solvers which create identical domains on separate processes by using the domain factory
  • change "allowing to use" for "allowing us to use"
  • change "that is make use of" for "that makes use of"
  • change "which give access to" for either "which gives access to" or "giving access to"
  • change "Here we choose Proximal [...]" for "Here we choose the Proximal [...]"
  • "Explanation needed about the not intuitive syntax to use here (needed for autocast?)" => ask @neo-alex to explain
  • change "Rolling out a solution" for "Rolling out the solution (found by PPO)"
  • change "see utils.py module" for "see the utils.py module"
  • "This is probably because it is not using the domain characteristics to their finer level.": this is actually to the fact that he reward is sparse (you get rewarded only when you reach the goal) and this is nearly impossible for this kind of RL algorithm to reach the goal just by chance without shaping the reward.
  • "well known to be suited to this kind of problem": because it exploits the knowledge of the goal and of heuristic metrics to reach the goal (e.g. euclidean or Manhattan distance)
  • "previously defined heuristic": I can't see where it was previously defined or presented
  • change "Training solver on domain" for "Training the solver on the domain"
  • change "Rolling out a solution" for "Rolling out the solution (found by Astar)"
  • change "more infos" for "more information"
  • change "all reward" for "all rewards"
  • In the list of reasons why Astar performs well on the maze, we should mention that it exploits the knowledge of the existence of the goal (aka the space reward in this domain)
  • change "from RL community" for "from the RL community"
  • change "That's why" for "That is why"
  • "define the domain with the finer granularity possible": and also to use the solvers that can exploit at most the known characteristics of the domain

@fteicht
Copy link
Collaborator

fteicht commented Nov 4, 2021

After reading the conclusion of this notebook, I am thinking that we should maybe create another notebook in the future to highlight a problem where RL is especially better than other methods. Because at the moment we only have tutorial notebooks which seem to constantly demonstrate the opposite. On pure control problems with continuous rewards (e.g. CartPole), RL is typically better, and this not the only case. It's obviously true for problems for which we don't know the model - but in that case they won't be many other options than RL neither.

@nhuet
Copy link
Contributor Author

nhuet commented Nov 5, 2021

I made all changes requested by @fteicht . It remains now only two explanations let to @neo-alex :

  • details about why this intermediate class D is needed (autocast?)
  • explanation about solve_with syntax : why this is called from domain class and not instance or even solver ?
    I think this is also for the autocast feature, but not sure how to explain it properly.
    Indeed for a new user, a more intuitive syntax would be solver.solve(domain_factory=lambda: domain) as @POPOGO pointed out to me.

@dbarbier
Copy link
Contributor

dbarbier commented Nov 5, 2021

[...]

* explanation about solve_with syntax : why this is called from domain class and not instance or even solver ?
  I think this is also for the autocast feature, but not sure how to explain it properly.
  Indeed for a new user, a more intuitive syntax would be `solver.solve(domain_factory=lambda: domain)` as @POPOGO pointed out to me.

This is explained below in the "Cleaning up the solver" section, doesn't it?

@nhuet
Copy link
Contributor Author

nhuet commented Nov 5, 2021

This is explained below in the "Cleaning up the solver" section, doesn't it?

Nope. What you are talking about is the syntax involving "with ...". What I was talking about is why using

MazeDomain.solve_with(solver, domain_factory)

instead of

solver.solve(domain_factory)

The latter being more intuitive to my mind. Especially why the need to call the domain class and not the instance itself.
Actually @POPOGO even told me he was using solver.solve(domain_factory) syntax but it is not how it should be done if i remember. And i would like to explain exactly why here. (and thus i need @neo-alex help to understand it perfectly)

Copy link
Collaborator

@neo-alex neo-alex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one mistake to correct please (see comment) and linting to apply, then OK for merge!

"metadata": {},
"outputs": [],
"source": [
"class MazeDomain(DeterministicPlanningDomain, UnrestrictedActions, Renderable):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be class MazeDomain(D) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, again a test that remained behind, my bad... Changed!

Main features:
- the code is  sequential (ie one solver after one is tested)
- maze only related code (drawing, generation) put in a separate module
- implementation of our own rollout() to show how it works
- references to relevant articles for each solver + some brief
  explanation
- use of c++ A* (rather than python one) + euclidean distance as
  heuristic (easier to understand thatn manhattan distance)
@galleon galleon merged commit 21a2b08 into airbus:master Nov 12, 2021
@nhuet nhuet deleted the maze_nb_seq branch November 26, 2021 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants