Tackle the "stochastic" issue while training: taking the difficulty of an episode into account. #23

felixchalumeau · 2020-07-03T09:33:56Z

The title might not be explicit enough, let me explain this point.

When training an agent to play mountainCar, CartPole, CarRacing etc... the best scores he can get are rather close from an episode to another. With the generator we have (at least for Graph coloring atm), the difficulty can change a lot from an episode to another (in term of number of nodes visited). This can lead to a good policy appearing bad, which we should avoid !

A rather simple solution would be to use a simple heuristic to give insights about the current episode (it is indeed very hard to control the difficulty from the generator point of view).
This won't be supervised learning at all as we are not trying to do what the heuristic does, we are just using the heuristic to give use more information about what's happening.
The time of a heuristic search won't make our experiment explose as the training process is much bigger in terms of time consumption.

This is a first basic idea which can be completed later for sure !

ilancoulon · 2020-07-03T13:39:00Z

The reward could actually be the difference between the simple heuristic and the LearnedHeuristic? What is called "Delta" in the display during training.
That would be better I think, since what we actually follow is that Delta.

I liked the fact that it could reach approximately the same level as the minimum heuristic without having any information about it though.

3rdCore · 2022-05-05T21:32:59Z

@louis-gautier this is about what we discussed yesterday about how to normalize the reward across the distribution of problems.

felixchalumeau self-assigned this Jul 3, 2020

ilancoulon added this to the v0.1 milestone Jul 8, 2020

jardinetsouffleton closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tackle the "stochastic" issue while training: taking the difficulty of an episode into account. #23

Tackle the "stochastic" issue while training: taking the difficulty of an episode into account. #23

felixchalumeau commented Jul 3, 2020

ilancoulon commented Jul 3, 2020

3rdCore commented May 5, 2022

Tackle the "stochastic" issue while training: taking the difficulty of an episode into account. #23

Tackle the "stochastic" issue while training: taking the difficulty of an episode into account. #23

Comments

felixchalumeau commented Jul 3, 2020

ilancoulon commented Jul 3, 2020

3rdCore commented May 5, 2022