Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New explorers and tripartite graph plot utils #239

Merged
merged 145 commits into from
Jul 8, 2022
Merged

Conversation

louis-gautier
Copy link
Collaborator

@louis-gautier louis-gautier commented Jun 15, 2022

This PR introduces a new explorer (softmax with temperature decay and UCB explorer) to test new exploring strategies for our RL agents.
It also adds a util which allows to plot tripartite graphs after they are initialized.

gostreap and others added 30 commits May 9, 2022 09:38
3rdCore and others added 21 commits May 20, 2022 11:45
Conflicts:
	Project.toml
	src/RL/nn_structures/heterogeneouscpnn.jl
	src/RL/nn_structures/heterogeneousvariableoutputcpnn.jl
	src/RL/representation/default/cp_layer/accessors.jl
	src/RL/representation/default/defaultstaterepresentation.jl
	src/RL/representation/default/heterogeneousstaterepresentation.jl
	src/RL/utils/geometricflux/heterogeneousgraphconv.jl
	test/CP/valueselection/learning/environment.jl
	test/RL/nn_structures/heterogeneousfullfeaturedcpnn.jl
	test/RL/representation/default/defaultstaterepresentation.jl
	test/RL/representation/default/defaulttrajectorystate.jl
	test/RL/representation/default/heterogeneousstaterepresentation.jl
	test/datagen/coloring.jl
@louis-gautier louis-gautier changed the title Softmax explorer with temperature decay New explorers and tripartite graph plot utils Jun 15, 2022
@louis-gautier louis-gautier requested a review from gostreap June 15, 2022 14:49
src/RL/representation/graphplotutils.jl Outdated Show resolved Hide resolved
)
end

function get_T(s::SoftmaxTDecayExplorer, step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but the explorer has a temperature that decreases linearly along the training. The temperature is used inside the softmax function that computes a density probability from the Q-vector. A decision on the variable is then drawn out of this discrete distribution?


function (s::SoftmaxTDecayExplorer)(values, mask)
T = get_T(s, s.step)
s.is_training && (s.step += 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain : && (s.step += 1)

Co-authored-by: Tom Marty <59280588+3rdCore@users.noreply.github.com>
Copy link
Collaborator Author

@louis-gautier louis-gautier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can merge this branch into master now.

@louis-gautier louis-gautier merged commit b8a2f86 into master Jul 8, 2022
@gostreap gostreap deleted the new_explorers branch August 17, 2022 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants