-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a general reward encouraging a smart exploration of the tree #226
Conversation
Conflicts: src/CP/core/model.jl src/experiment/metrics/basicmetrics.jl
…arch/SeaPearl.jl into louis/general_reward
That would be nice to have some comparison with other reward know to be working on specific problems ( tsptw, graphColoring ..) before merging what do you think of this ? |
Sure. Here is a comparison of On TSPTW(10) however, no learning can be observed (the rewards are consistent but the agent does not learn), probably due to pitfalls in the treatment of the representation as the graph is very large on TSPTW. @malikattalah has worked on comparing the 2 rewards on nqueens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reward looks very well designed to me and will probably improve performance compared to previous general rewards. I look forward to this being merged into the main branch.
However before that, I think it is necessary to clean up the code a bit, especially by removing all prints that were only related to the development of the reward.
It would also be nice to add some tests to keep the project on the right track regarding unit tests.
@@ -53,3 +53,44 @@ function (nn::CPNN)(states::BatchedDefaultTrajectoryState) | |||
return output | |||
end | |||
end | |||
|
|||
# Overloads the Base.string() function for storing parameters of the neural networks associated to experiments. | |||
function Base.string(nn::CPNN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this as long as CPNN will be partially refacto ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has nothing to do with the reward but it is really useful to keep track of the neural networks we use when we do experiments (the returned string is saved in params.json
files). For performing experiments on the new heterogeneous pipeline we have duplicated CPNN so I think we should keep it like this for the moment.
I totally agree with @gostreap regarding the testsets ! |
Unit tests have been added for this reward. We should soon be able to merge, which will help us conducting tests on problems with objective variables for the new heterogeneous graph pipeline. |
This PR adds a reward called
ExperimentalReward
(will have to be renamed toGeneralReward
once merged with master).The goal of this reward is to be as general as possible to encourage a smart exploration of the tree of solutions of any problem solved using
DefaultStateRepresentation
. Its functioning is briefly exposed below:EndingPhase
as it was previously done withCPReward
. It should allow faster learning in large search trees and foster a smart exploration of this tree.The rewards are given as described here:
We encourage the agent to bound as many variables as possible in one assignation. The agent thus receives a reward of (with to make this function convex) at each decision.
At the
EndingPhase
, the agent receives a negative penalty of if it reaches an infeasible solution.We encourage the agent to keep low values in the domain of the objective function throughout the resolution. The agent thus receives a reward of .
At the
EndingPhase
, the agent receives a negative penalty of -1 if it reaches an infeasible solution.