Enable imitation learning #223

ziadelassal · 2022-05-04T16:26:19Z

This PR adds an imitation learning functionality in a new SupervisedLearnedHeuristic, which is an adaptation of LearnedHeuristic (which name has changed to SimpleLearnedHeuristic).

At each InitializingPhase, ie at the beginning of each episode, and with a probability eta, the SupervisedLearnedHeuristic solves the instance using classic CP and adds the solution to its field helpSolution, which is set to nothing by default.
At each DecisionPhase:
i. if helpSolution is nothing the agent takes an action following its policy: action = valueSelection.agent(env),
ii. if helpSolution contains a solution, the agent takes an action corresponding to the solution. At the end of the episode, the CP-obtained solution is reconstituted.

This approach is inspired from Gasse et al. (https://arxiv.org/abs/1906.01629) and allows the agent to learn from "good" examples even since the beginning of its training.

…to imitation

…learned heuristic.

gostreap

Good job in general and well commented, I think it fits well in the solver.

Simply, it is possible to avoid the duplication of functions common to SimpleLearnedHeuristic and SupervisedLearnedHeuristic. I will add this in a later commit.

It would be nice to add tests in order to check that the helpSolution is correctly set in addition to testing the reproducibility with rng.

src/CP/valueselection/learning/learnedheuristic.jl

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

src/experiment/launch_experiment.jl

…to imitation

…rch/SeaPearl.jl into imitation

…rl.jl into imitation

louis-gautier · 2022-05-06T18:27:37Z

Project.toml

@@ -32,7 +32,7 @@ Flux = "0.11 - 0.12"
 JuMP = "0.21"
 LightGraphs = "1.3"
 MathOptInterface = "0.9"
-NNlib = "0.7"
+NNlib = "0.8.4"


Useful to work with ReinforcementLearning.jl in dev mode

louis-gautier · 2022-05-06T18:39:14Z

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

+"""
+function (valueSelection::SupervisedLearnedHeuristic)(PHASE::Type{DecisionPhase}, model::CPModel, x::Union{Nothing,AbstractIntVar})
+    # domain change metrics, set before reward is updated
+    set_metrics!(PHASE, valueSelection.search_metrics, model, x)


Are we sure we want the metrics to be totally independent from the chosen value heuristic type?

I'm not sure I understand, what would you consider here?

louis-gautier · 2022-05-06T18:42:38Z

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

+        return valueSelection.eta_stable
+    else
+        steps_left = valueSelection.warmup_steps + valueSelection.decay_steps - step
+        return valueSelection.eta_stable + steps_left / valueSelection.decay_steps * (valueSelection.eta_init - valueSelection.eta_stable)


Do we want a linear decay like this or would an exponential one be more adapted?

It would indeed be nice to be able to choose, but it doesn't seem to me to be absolutely essential at the moment.

louis-gautier · 2022-05-06T18:50:50Z

test/CP/valueselection/learning/supervisedlearnedheuristic.jl

+        false_x = first(values(model.variables))
+        env = SeaPearl.get_observation!(lh, model, false_x)
+
+


Perfect to do step-by-step tests like this. Thank you very much for the time spent on this. It will ensure that the behavior of SupervisedLearnedHeuristic is as expected.

3rdCore · 2022-05-06T20:13:00Z

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

+                valueSelection.helpSolution = solutions[1]
+            end
+        end
+        reset_model!(model_duplicate)


Why do we reset the duplicated model ?

This is probably unnecessary and should indeed be removed.

3rdCore · 2022-05-06T20:13:41Z

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

+        if !isnothing(model_duplicate.statistics.solutions)
+            solutions = model_duplicate.statistics.solutions[model_duplicate.statistics.solutions.!=nothing]
+            if length(solutions) >= 1
+                valueSelection.helpSolution = solutions[1]


Why do we select the first solution found, instead of the last solution found which is supposed to be of better quality!

This could indeed be a much better approach!

3rdCore · 2022-05-06T20:15:01Z

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

+    end
+
+    # Reset the agent, useful for things like recurrent networks
+    Flux.reset!(valueSelection.agent)


I'm just wondering why we previously added this line?

I honestly have no idea, it seemed to me that this was not a problem but I don't know if it is useful.

3rdCore · 2022-05-06T20:22:32Z

src/CP/valueselection/learning/supervisedlearnedheuristic.jl

+
+Get the current value of `eta` (η), which is the probability for the solver to calculate and provide a classic CP-generated solution to the agent.
+"""
+function get_eta(valueSelection::SupervisedLearnedHeuristic)


To what extend do we need to have its own specific decay rate ?

Could we have used the value of the agent.policy.explorer ?

The of the greedy explorer represents the propensity of the agent to make random decisions when it makes a decision while the represents the propensity of the heuristic to guide the agent in a supervised way, without letting it make a decision.

These are for me distinct quantities that should not be linked.

3rdCore · 2022-05-06T20:29:04Z

test/CP/valueselection/learning/supervisedlearnedheuristic.jl

+        x = variable_heuristic(model)
+        v = lh(SeaPearl.DecisionPhase, model, x)
+        @test v == lh.helpSolution[x.id]
+        lh(SeaPearl.EndingPhase, model, :Feasible)


Is this line usefull?

We have tried to keep the heuristic calls as close as possible, in order to detect possible problems in the future. However, I don't think that deleting it would influence the result of the test in the current state of the project and the heuristic.

I am still in favor of keeping this line in the test.

3rdCore · 2022-05-06T20:30:50Z

test/CP/valueselection/learning/supervisedlearnedheuristic.jl

+        lh(SeaPearl.InitializingPhase, model)
+        @test isnothing(lh.helpSolution)
+        SeaPearl.reset_model!(model)
+
+        lh(SeaPearl.InitializingPhase, model)
+        @test isnothing(lh.helpSolution)
+        SeaPearl.reset_model!(model)
+
+        lh(SeaPearl.InitializingPhase, model)
+        @test isnothing(lh.helpSolution)
+        SeaPearl.reset_model!(model)
+
+        lh(SeaPearl.InitializingPhase, model)
+        @test !isnothing(lh.helpSolution)
+        SeaPearl.reset_model!(model)
+
+        lh(SeaPearl.InitializingPhase, model)
+        @test isnothing(lh.helpSolution)


Nice check using the rng.

gostreap · 2022-05-09T20:22:30Z

From the last comments, it seems that we have merged the branch a bit early. I will work on a new small pull request to correct the few points mentioned.

gostreap and others added 16 commits April 27, 2022 15:02

Create SupervisedLearnedHeuristic and rename LearnedHeuristic

9c5d5d6

Add action selection based on solution to SupervisedLearnedHeuristic

e5d6be8

Add comment

eb3298c

Take solution with proba 0.5

190a82c

Removed solution field from SuperviseLearnedHeuristic

26ca468

Remove useless line and comment

4ec91ea

Internalized the classic CP solving into SupervisedLearnedHEuristic

6171c40

Merge branch 'imitation' of github.com:corail-research/SeaPearl.jl in…

fd7487a

…to imitation

Add eta decay and LearnedHeuristic documentation

72872cc

Add definition of SupervisedLearnedHeuristic

dce4852

Change name LearnedHeuristic to SimpleLearnedHeuristic in tests

95d4fc7

Add tests for SupervisedLearnedHeuristic

51bd4e3

Add the new definition of SupervisedLearnedHeuristic to learning.jl

2c158c3

Add rng argument for reproductibility

23f9d4c

Change heuristic name to SimpleLearnedHeuristic in test

a726c8c

Minor fix : Replaced LearnedHeuristic by SimpleLearnedHeuristic in tests

186a9ae

ziadelassal requested a review from 3rdCore May 4, 2022 16:27

Merge branch 'master' into imitation

caad08a

ziadelassal requested review from gostreap, louis-gautier and malikattalah May 4, 2022 16:30

ziadelassal marked this pull request as ready for review May 4, 2022 16:46

Removes the duplication of functions common to Simple and Supervised …

22c01dc

…learned heuristic.

gostreap requested changes May 4, 2022

View reviewed changes

gostreap and others added 6 commits May 4, 2022 16:55

Change rng to add default RandomGenerator in parameter

2599212

New file for simplelearnedheuristic.jl

7e901e4

Fix test supervisedlearnedheuristic

5dd3b99

Test that the value of helpSolution are used

c3064cd

Add test get_eta to supervisedlearnedheuristic

87b765c

Minor comments corrections

e211648

ziadelassal and others added 4 commits May 5, 2022 11:01

Merge branch 'imitation' of github.com:corail-research/SeaPearl.jl in…

b7ab463

…to imitation

Merge branch 'master' into imitation

74718ac

Add a random generator test

1f5b184

Merge branches 'imitation' and 'imitation' of github.com:corail-resea…

d2a23f3

…rch/SeaPearl.jl into imitation

gostreap added reinforcement learning heuristic labels May 5, 2022

gostreap approved these changes May 5, 2022

View reviewed changes

ziadelassal added 2 commits May 5, 2022 16:58

Merge branches 'imitation' and 'imitation' of github.com:corail-resea…

315b552

…rch/SeaPearl.jl into imitation

Merge branch 'imitation' of https://github.com/corail-research/SeaPea…

7a10128

…rl.jl into imitation

gostreap force-pushed the imitation branch from 3e6315d to 7a10128 Compare May 6, 2022 14:08

gostreap and others added 2 commits May 6, 2022 10:13

Merge branch 'master' into imitation

2d96178

New test use SimpleLearnedHeuristic

65a367b

louis-gautier approved these changes May 6, 2022

View reviewed changes

3rdCore approved these changes May 6, 2022

View reviewed changes

ziadelassal merged commit 2b13bc5 into master May 6, 2022

gostreap mentioned this pull request May 10, 2022

Small improvement in the way the helpSolution is retrieved in SupervisedLearnedHeuristic #228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable imitation learning #223

Enable imitation learning #223

ziadelassal commented May 4, 2022 •

edited

Loading

gostreap left a comment

louis-gautier May 6, 2022

louis-gautier May 6, 2022

gostreap May 9, 2022

louis-gautier May 6, 2022

gostreap May 9, 2022

louis-gautier May 6, 2022

3rdCore May 6, 2022

gostreap May 9, 2022

3rdCore May 6, 2022

gostreap May 9, 2022

3rdCore May 6, 2022

gostreap May 9, 2022

3rdCore May 6, 2022

gostreap May 9, 2022

3rdCore May 6, 2022

gostreap May 9, 2022 •

edited

Loading

3rdCore May 6, 2022

gostreap commented May 9, 2022

		false_x = first(values(model.variables))
		env = SeaPearl.get_observation!(lh, model, false_x)

Enable imitation learning #223

Enable imitation learning #223

Conversation

ziadelassal commented May 4, 2022 • edited Loading

gostreap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gostreap May 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gostreap commented May 9, 2022

ziadelassal commented May 4, 2022 •

edited

Loading

gostreap May 9, 2022 •

edited

Loading