Maze notebook: educational version #65

nhuet · 2021-09-30T08:06:43Z

No description provided.

dbarbier · 2021-09-30T08:26:51Z

@nhuet Can you please provide a link to binder?

nhuet · 2021-09-30T10:02:35Z

notebooks/maze_utils.py

galleon · 2021-09-30T17:50:58Z

Here is the link using the local (on this repo) binder env.

notebooks/maze_tuto.ipynb

galleon · 2021-09-30T17:59:30Z

notebooks/maze_tuto.ipynb

+    "\n",
+    "    # Initialize image\n",
+    "    figure = domain.render(observation)\n",
+    "    display(figure)\n",


Use display(domain.render(observation)) as earlier - same later.

galleon · 2021-09-30T18:00:49Z

notebooks/maze_tuto.ipynb

+    "        max_steps=max_steps,\n",
+    "        max_framerate=None,\n",
+    "        outcome_formatter=None,\n",
+    "        render=False,\n",


I did change this to True ... (assuming a user would try to do so) but the graphic is not displayed.

I think it would be best to solve and then write our own roll out ...

I did that, (writting the rollout) but in A* (following the narratives). I can put it here if necessary.
The bad part is that we need to do that at the same time as the solve because of this "with ..." syntax that scikit-decide want to enforce when using solvers.

The basic usage is to call rollout from inside a context manager; it is okay IMO to emulate it without a context manager, but the important point is to call solver.close() at the end.

dbarbier · 2021-10-01T12:53:48Z

notebooks/maze_tuto.ipynb

+    "        Here Manhattan distance to goal.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        return Value(cost=sqrt((self.end.x - s.x) ** 2 + (self.end.y - s.y) ** 2))"


This is usual Euclidean distance, not Manhattan. BTW do we really need sqrt?

Euh you are right of course. I juste copy/paste a comment from Alex without thinking further... my bad

Would be better to use Manhattan distance for a Maze (and it works):
return Value(cost=abs(self.end.x - s.x) + abs(self.end.y - s.y))

@fteicht said to use eucliean distance because:

simpler to understand for everyone

consistent with @neo-alex tutorial

nhuet · 2021-10-01T14:02:21Z

@neo-alex @fteicht : i think i took all commentary made into account, so it should be now your turn to review.
NB:

missing explanation texts are shown via placeholders like that: text needed (bold + italic)
the syntax using with for the solver prevent splitting solver part from rollout part -> do you think it would be a good idea to split it, even though not using your preferred syntax? (you will also see that my explanation about that syntax is quite clumsy)
we should also think to fix an appropriate total_timesteps for ppo so that it is still clear that is not about to work, and fast to compute at the same time

nhuet · 2021-10-04T08:59:38Z

Finally I decided to simplify explanations by separating solve and rollout in 2 different cells. Still i added a comment at the end to explain how to automatically call cleanup thanks to with. You will see how this last commit makes the notebook much more readable (imho).

nhuet · 2021-10-07T08:14:45Z

Fixing #50

nhuet · 2021-10-18T10:27:10Z

I added references for A* and PPO solvers.

Some explanations are still needed (see bold italic placeholders). To simplify, I suggest to skip the part about "key concepts" and add explanations if needed in each subsection of "MazeDomain definition".

I would also like explaining why the syntax to solve is "DomainClass.solve_with(solver, domain_factory)" rather than "solver.solve(domain_factory)" which would be more obvious for a new user. We have to explain the need to call the Domain Class here i think.

nhuet · 2021-10-25T21:55:23Z

I prefixed the name with "1_" so that it will be seen as first notebook in examples list by PR #85.
Thus, new link to launch on binder:

fteicht

Thank you for this amazing notebook!
Here are a few minor comments:

remove the '.' at the end of the first bullet sentence in the introduction text
change "About maze problem" for "About the maze problem"
change "bottom_right" for "bottom-right" in the paragraph titled "About maze problem"
change "recognized as scikit-decide domain" for "recognized as a scikit-decide domain"
change "We also specify type of states [...]" for "We also specify the types of states [...]"
should we explain why this is a DeterministicPlanningDomain? (deterministic starting states and transitions, white box transition model, definition of goal states)
"Perhaps more details about why this intermediate class D is needed (autocast?)" => yes, I believe this is definitely required, I let @neo-alex comment on this.
"Text needed to explain why methods to overload have leading underscores" => I guess @nhuet will do it otherwise please ask @neo-alex to comment
"we will need a domain factory recreating the domain at will": we can mention that it is for instance useful for parallel solvers which create identical domains on separate processes by using the domain factory
change "allowing to use" for "allowing us to use"
change "that is make use of" for "that makes use of"
change "which give access to" for either "which gives access to" or "giving access to"
change "Here we choose Proximal [...]" for "Here we choose the Proximal [...]"
"Explanation needed about the not intuitive syntax to use here (needed for autocast?)" => ask @neo-alex to explain
change "Rolling out a solution" for "Rolling out the solution (found by PPO)"
change "see utils.py module" for "see the utils.py module"
"This is probably because it is not using the domain characteristics to their finer level.": this is actually to the fact that he reward is sparse (you get rewarded only when you reach the goal) and this is nearly impossible for this kind of RL algorithm to reach the goal just by chance without shaping the reward.
"well known to be suited to this kind of problem": because it exploits the knowledge of the goal and of heuristic metrics to reach the goal (e.g. euclidean or Manhattan distance)
"previously defined heuristic": I can't see where it was previously defined or presented
change "Training solver on domain" for "Training the solver on the domain"
change "Rolling out a solution" for "Rolling out the solution (found by Astar)"
change "more infos" for "more information"
change "all reward" for "all rewards"
In the list of reasons why Astar performs well on the maze, we should mention that it exploits the knowledge of the existence of the goal (aka the space reward in this domain)
change "from RL community" for "from the RL community"
change "That's why" for "That is why"
"define the domain with the finer granularity possible": and also to use the solvers that can exploit at most the known characteristics of the domain

fteicht · 2021-11-04T18:17:34Z

After reading the conclusion of this notebook, I am thinking that we should maybe create another notebook in the future to highlight a problem where RL is especially better than other methods. Because at the moment we only have tutorial notebooks which seem to constantly demonstrate the opposite. On pure control problems with continuous rewards (e.g. CartPole), RL is typically better, and this not the only case. It's obviously true for problems for which we don't know the model - but in that case they won't be many other options than RL neither.

nhuet · 2021-11-05T10:47:38Z

I made all changes requested by @fteicht . It remains now only two explanations let to @neo-alex :

details about why this intermediate class D is needed (autocast?)
explanation about solve_with syntax : why this is called from domain class and not instance or even solver ?
I think this is also for the autocast feature, but not sure how to explain it properly.
Indeed for a new user, a more intuitive syntax would be solver.solve(domain_factory=lambda: domain) as @POPOGO pointed out to me.

dbarbier · 2021-11-05T11:32:32Z

[...]

* explanation about solve_with syntax : why this is called from domain class and not instance or even solver ?
  I think this is also for the autocast feature, but not sure how to explain it properly.
  Indeed for a new user, a more intuitive syntax would be `solver.solve(domain_factory=lambda: domain)` as @POPOGO pointed out to me.

This is explained below in the "Cleaning up the solver" section, doesn't it?

nhuet · 2021-11-05T14:07:05Z

This is explained below in the "Cleaning up the solver" section, doesn't it?

Nope. What you are talking about is the syntax involving "with ...". What I was talking about is why using

MazeDomain.solve_with(solver, domain_factory)

instead of

solver.solve(domain_factory)

The latter being more intuitive to my mind. Especially why the need to call the domain class and not the instance itself.
Actually @POPOGO even told me he was using solver.solve(domain_factory) syntax but it is not how it should be done if i remember. And i would like to explain exactly why here. (and thus i need @neo-alex help to understand it perfectly)

neo-alex

Just one mistake to correct please (see comment) and linting to apply, then OK for merge!

neo-alex · 2021-11-10T18:07:35Z

notebooks/1_maze_tuto.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class MazeDomain(DeterministicPlanningDomain, UnrestrictedActions, Renderable):\n",


Should be class MazeDomain(D) instead

Oops, again a test that remained behind, my bad... Changed!

Main features: - the code is sequential (ie one solver after one is tested) - maze only related code (drawing, generation) put in a separate module - implementation of our own rollout() to show how it works - references to relevant articles for each solver + some brief explanation - use of c++ A* (rather than python one) + euclidean distance as heuristic (easier to understand thatn manhattan distance)

nhuet marked this pull request as draft September 30, 2021 08:06

nhuet force-pushed the maze_nb_seq branch 2 times, most recently from 7fd5c9a to e38c6b3 Compare September 30, 2021 10:02

dbarbier reviewed Sep 30, 2021

View reviewed changes

notebooks/maze_utils.py Outdated Show resolved Hide resolved

nhuet marked this pull request as ready for review September 30, 2021 17:03

galleon reviewed Sep 30, 2021

View reviewed changes

notebooks/maze_tuto.ipynb Outdated Show resolved Hide resolved

galleon reviewed Sep 30, 2021

View reviewed changes

notebooks/maze_tuto.ipynb Outdated Show resolved Hide resolved

galleon reviewed Sep 30, 2021

View reviewed changes

dbarbier reviewed Oct 1, 2021

View reviewed changes

nhuet mentioned this pull request Oct 5, 2021

** Maze Notebook ** #50

Closed

nhuet force-pushed the maze_nb_seq branch from e3af462 to d0d3f75 Compare October 7, 2021 13:15

galleon mentioned this pull request Oct 8, 2021

Maze notebook #47

Closed

nhuet force-pushed the maze_nb_seq branch 5 times, most recently from b276d85 to 247ec4d Compare October 22, 2021 13:42

dbarbier mentioned this pull request Oct 25, 2021

Deployment of linters #84

Closed

10 tasks

nhuet mentioned this pull request Oct 25, 2021

Integration of new notebooks in doc #85

Closed

nhuet force-pushed the maze_nb_seq branch 2 times, most recently from 386a038 to 5047f51 Compare October 26, 2021 21:27

nhuet mentioned this pull request Oct 27, 2021

ci-cd: add env variables needed to generate binder links in online doc #89

Merged

nhuet force-pushed the maze_nb_seq branch from 5047f51 to 119adf7 Compare November 2, 2021 15:09

fteicht self-requested a review November 4, 2021 17:32

fteicht requested changes Nov 4, 2021

View reviewed changes

neo-alex requested changes Nov 10, 2021

View reviewed changes

nhuet force-pushed the maze_nb_seq branch from a186e54 to e63ebac Compare November 12, 2021 11:16

galleon merged commit 21a2b08 into airbus:master Nov 12, 2021

nhuet deleted the maze_nb_seq branch November 26, 2021 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maze notebook: educational version #65

Maze notebook: educational version #65

nhuet commented Sep 30, 2021

dbarbier commented Sep 30, 2021

nhuet commented Sep 30, 2021

galleon commented Sep 30, 2021

galleon Sep 30, 2021

galleon Sep 30, 2021

galleon Sep 30, 2021

nhuet Sep 30, 2021

dbarbier Oct 1, 2021 •

edited

Loading

dbarbier Oct 1, 2021

nhuet Oct 1, 2021

galleon Oct 8, 2021

nhuet Oct 11, 2021

nhuet commented Oct 1, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 7, 2021

nhuet commented Oct 18, 2021

nhuet commented Oct 25, 2021 •

edited

Loading

fteicht left a comment

fteicht commented Nov 4, 2021 •

edited

Loading

nhuet commented Nov 5, 2021

dbarbier commented Nov 5, 2021

nhuet commented Nov 5, 2021

neo-alex left a comment

neo-alex Nov 10, 2021

nhuet Nov 12, 2021

Maze notebook: educational version #65

Maze notebook: educational version #65

Conversation

nhuet commented Sep 30, 2021

dbarbier commented Sep 30, 2021

nhuet commented Sep 30, 2021

galleon commented Sep 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbarbier Oct 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nhuet commented Oct 1, 2021

nhuet commented Oct 4, 2021

nhuet commented Oct 7, 2021

nhuet commented Oct 18, 2021

nhuet commented Oct 25, 2021 • edited Loading

fteicht left a comment

Choose a reason for hiding this comment

fteicht commented Nov 4, 2021 • edited Loading

nhuet commented Nov 5, 2021

dbarbier commented Nov 5, 2021

nhuet commented Nov 5, 2021

neo-alex left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbarbier Oct 1, 2021 •

edited

Loading

nhuet commented Oct 25, 2021 •

edited

Loading

fteicht commented Nov 4, 2021 •

edited

Loading