Creating a general reward encouraging a smart exploration of the tree #226

louis-gautier · 2022-05-06T14:14:38Z

This PR adds a reward called ExperimentalReward (will have to be renamed to GeneralReward once merged with master).

The goal of this reward is to be as general as possible to encourage a smart exploration of the tree of solutions of any problem solved using DefaultStateRepresentation. Its functioning is briefly exposed below:

By design, we make sure that any found feasible solution always gets a higher reward than any infeasible solution.
We choose to assign rewards to each decision and not only in EndingPhase as it was previously done with CPReward. It should allow faster learning in large search trees and foster a smart exploration of this tree.

The rewards are given as described here:

In all cases:

We encourage the agent to bound as many variables as possible in one assignation. The agent thus receives a reward of (with to make this function convex) at each decision.

At the EndingPhase, the agent receives a negative penalty of if it reaches an infeasible solution.

If the problem has an objective function, we add the following elements:

We encourage the agent to keep low values in the domain of the objective function throughout the resolution. The agent thus receives a reward of .

At the EndingPhase, the agent receives a negative penalty of -1 if it reaches an infeasible solution.

Conflicts: src/CP/core/model.jl src/experiment/metrics/basicmetrics.jl

…arch/SeaPearl.jl into louis/general_reward

…bjective domain

3rdCore · 2022-05-06T21:54:35Z

That would be nice to have some comparison with other reward know to be working on specific problems ( tsptw, graphColoring ..) before merging what do you think of this ?

louis-gautier · 2022-05-09T13:42:41Z

That would be nice to have some comparison with other reward know to be working on specific problems ( tsptw, graphColoring ..) before merging what do you think of this ?

Sure. Here is a comparison of GeneralReward ("learned") and CPReward ("learned2") on graphcoloring(20) with the default hyperparameters for both rewards.

On TSPTW(10) however, no learning can be observed (the rewards are consistent but the agent does not learn), probably due to pitfalls in the treatment of the representation as the graph is very large on TSPTW.

@malikattalah has worked on comparing the 2 rewards on nqueens.

gostreap

The reward looks very well designed to me and will probably improve performance compared to previous general rewards. I look forward to this being merged into the main branch.

However before that, I think it is necessary to clean up the code a bit, especially by removing all prints that were only related to the development of the reward.

It would also be nice to add some tests to keep the project on the right track regarding unit tests.

src/CP/core/search/dfs.jl

src/CP/valueselection/learning/rewards/generalreward.jl

src/experiment/metrics/basicmetrics.jl

3rdCore · 2022-05-11T18:35:09Z

src/RL/nn_structures/cpnn.jl

@@ -53,3 +53,44 @@ function (nn::CPNN)(states::BatchedDefaultTrajectoryState)
        return output
    end
 end
+
+# Overloads the Base.string() function for storing parameters of the neural networks associated to experiments.
+function Base.string(nn::CPNN)


Do we really need this as long as CPNN will be partially refacto ?

It has nothing to do with the reward but it is really useful to keep track of the neural networks we use when we do experiments (the returned string is saved in params.json files). For performing experiments on the new heterogeneous pipeline we have duplicated CPNN so I think we should keep it like this for the moment.

3rdCore · 2022-05-11T18:36:48Z

I totally agree with @gostreap regarding the testsets !

louis-gautier · 2022-05-11T21:20:48Z

Unit tests have been added for this reward. We should soon be able to merge, which will help us conducting tests on problems with objective variables for the new heterogeneous graph pipeline.

louis-gautier and others added 14 commits April 29, 2022 15:06

Added ExperimentalReward

b4ab114

Added reference to ExperimentalReward

710cdce

Overload Base.string(CPNN) for experiment storage

a040bb4

Fixed broken reference

c024357

Improved ExperimentalReward

912577d

Implemented new experimental reward

7291d97

Fixed pruning from middle of domain bug

85180be

Add gamma multiplier in decisionphase

a5c2470

Make the pruning component of the objective function between 0 and 1

332fa9b

Merge branch 'master' into louis/general_reward

cd55afe

Conflicts: src/CP/core/model.jl src/experiment/metrics/basicmetrics.jl

Merge branch 'louis/general_reward' of https://github.com/corail-rese…

e685eb8

…arch/SeaPearl.jl into louis/general_reward

Fixed bugs related to merge

6fcc015

Fixed Reward construction for problems without objective

06dec6e

Made variable-value edges counting more robust

13f672e

louis-gautier requested review from gostreap, 3rdCore and malikattalah May 6, 2022 14:14

louis-gautier added reward reinforcement learning labels May 6, 2022

louis-gautier added 4 commits May 6, 2022 16:07

Renamed ExperimentalReward to GeneralReward

8c1cc50

Renamed ExperimentalReward to GeneralReward

22d214e

Solved the situation when pruning is performed in the middle of the o…

3c14f18

…bjective domain

Fixed error when there is no objective function

559e852

louis-gautier marked this pull request as ready for review May 9, 2022 13:43

Merge branch 'master' into louis/general_reward

ef7ff14

louis-gautier mentioned this pull request May 9, 2022

Generic reward encouraging smart exploration of the tree. #216

Closed

3 tasks

gostreap requested changes May 9, 2022

View reviewed changes

Clearer statistics updates in DFS for GeneralReward

7b41c64

louis-gautier added 3 commits May 11, 2022 11:52

Cleaned GeneralReward

ca50300

Fixed SyntaxError on new updateStatistics! function

f7bcb64

Cleaning GeneralReward pipeline further

45ab2d0

3rdCore reviewed May 11, 2022

View reviewed changes

3rdCore approved these changes May 11, 2022

View reviewed changes

Tests for GeneralReward

30e442f

louis-gautier requested a review from gostreap May 11, 2022 21:17

gostreap approved these changes May 12, 2022

View reviewed changes

louis-gautier merged commit e0fd95e into master May 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a general reward encouraging a smart exploration of the tree #226

Creating a general reward encouraging a smart exploration of the tree #226

louis-gautier commented May 6, 2022 •

edited

Loading

3rdCore commented May 6, 2022

louis-gautier commented May 9, 2022

gostreap left a comment

3rdCore May 11, 2022

louis-gautier May 11, 2022

3rdCore commented May 11, 2022

louis-gautier commented May 11, 2022

Creating a general reward encouraging a smart exploration of the tree #226

Creating a general reward encouraging a smart exploration of the tree #226

Conversation

louis-gautier commented May 6, 2022 • edited Loading

3rdCore commented May 6, 2022

louis-gautier commented May 9, 2022

gostreap left a comment

Choose a reason for hiding this comment

3rdCore May 11, 2022

Choose a reason for hiding this comment

louis-gautier May 11, 2022

Choose a reason for hiding this comment

3rdCore commented May 11, 2022

louis-gautier commented May 11, 2022

louis-gautier commented May 6, 2022 •

edited

Loading