Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a general reward encouraging a smart exploration of the tree #226

Merged
merged 24 commits into from
May 12, 2022

Conversation

louis-gautier
Copy link
Collaborator

@louis-gautier louis-gautier commented May 6, 2022

This PR adds a reward called ExperimentalReward (will have to be renamed to GeneralReward once merged with master).

The goal of this reward is to be as general as possible to encourage a smart exploration of the tree of solutions of any problem solved using DefaultStateRepresentation. Its functioning is briefly exposed below:

  • By design, we make sure that any found feasible solution always gets a higher reward than any infeasible solution.
  • We choose to assign rewards to each decision and not only in EndingPhase as it was previously done with CPReward. It should allow faster learning in large search trees and foster a smart exploration of this tree.

The rewards are given as described here:

  • In all cases:

We encourage the agent to bound as many variables as possible in one assignation. The agent thus receives a reward of equation (with equation to make this function convex) at each decision.

At the EndingPhase, the agent receives a negative penalty of equation if it reaches an infeasible solution.

  • If the problem has an objective function, we add the following elements:

We encourage the agent to keep low values in the domain of the objective function throughout the resolution. The agent thus receives a reward of equation.

At the EndingPhase, the agent receives a negative penalty of -1 if it reaches an infeasible solution.

@3rdCore
Copy link
Collaborator

3rdCore commented May 6, 2022

That would be nice to have some comparison with other reward know to be working on specific problems ( tsptw, graphColoring ..) before merging what do you think of this ?

@louis-gautier
Copy link
Collaborator Author

That would be nice to have some comparison with other reward know to be working on specific problems ( tsptw, graphColoring ..) before merging what do you think of this ?

Sure. Here is a comparison of GeneralReward ("learned") and CPReward ("learned2") on graphcoloring(20) with the default hyperparameters for both rewards.

c35d3fc0-71b6-40ad-939d-d261bfa7b780

On TSPTW(10) however, no learning can be observed (the rewards are consistent but the agent does not learn), probably due to pitfalls in the treatment of the representation as the graph is very large on TSPTW.

@malikattalah has worked on comparing the 2 rewards on nqueens.

@louis-gautier louis-gautier marked this pull request as ready for review May 9, 2022 13:43
Copy link
Collaborator

@gostreap gostreap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reward looks very well designed to me and will probably improve performance compared to previous general rewards. I look forward to this being merged into the main branch.

However before that, I think it is necessary to clean up the code a bit, especially by removing all prints that were only related to the development of the reward.

It would also be nice to add some tests to keep the project on the right track regarding unit tests.

src/CP/core/search/dfs.jl Outdated Show resolved Hide resolved
src/CP/core/search/dfs.jl Outdated Show resolved Hide resolved
src/CP/valueselection/learning/rewards/generalreward.jl Outdated Show resolved Hide resolved
src/CP/valueselection/learning/rewards/generalreward.jl Outdated Show resolved Hide resolved
src/CP/valueselection/learning/rewards/generalreward.jl Outdated Show resolved Hide resolved
src/experiment/metrics/basicmetrics.jl Show resolved Hide resolved
src/experiment/metrics/basicmetrics.jl Show resolved Hide resolved
src/experiment/metrics/basicmetrics.jl Outdated Show resolved Hide resolved
@@ -53,3 +53,44 @@ function (nn::CPNN)(states::BatchedDefaultTrajectoryState)
return output
end
end

# Overloads the Base.string() function for storing parameters of the neural networks associated to experiments.
function Base.string(nn::CPNN)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this as long as CPNN will be partially refacto ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has nothing to do with the reward but it is really useful to keep track of the neural networks we use when we do experiments (the returned string is saved in params.json files). For performing experiments on the new heterogeneous pipeline we have duplicated CPNN so I think we should keep it like this for the moment.

@3rdCore
Copy link
Collaborator

3rdCore commented May 11, 2022

I totally agree with @gostreap regarding the testsets !

@louis-gautier louis-gautier requested a review from gostreap May 11, 2022 21:17
@louis-gautier
Copy link
Collaborator Author

Unit tests have been added for this reward. We should soon be able to merge, which will help us conducting tests on problems with objective variables for the new heterogeneous graph pipeline.

@louis-gautier louis-gautier merged commit e0fd95e into master May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants