-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New explorers and tripartite graph plot utils #239
Conversation
…l into heterogeneous
…l into heterogeneous
…l into heterogeneous
Conflicts: Project.toml src/RL/nn_structures/heterogeneouscpnn.jl src/RL/nn_structures/heterogeneousvariableoutputcpnn.jl src/RL/representation/default/cp_layer/accessors.jl src/RL/representation/default/defaultstaterepresentation.jl src/RL/representation/default/heterogeneousstaterepresentation.jl src/RL/utils/geometricflux/heterogeneousgraphconv.jl test/CP/valueselection/learning/environment.jl test/RL/nn_structures/heterogeneousfullfeaturedcpnn.jl test/RL/representation/default/defaultstaterepresentation.jl test/RL/representation/default/defaulttrajectorystate.jl test/RL/representation/default/heterogeneousstaterepresentation.jl test/datagen/coloring.jl
) | ||
end | ||
|
||
function get_T(s::SoftmaxTDecayExplorer, step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but the explorer has a temperature that decreases linearly along the training. The temperature is used inside the softmax function that computes a density probability from the Q-vector. A decision on the variable is then drawn out of this discrete distribution?
|
||
function (s::SoftmaxTDecayExplorer)(values, mask) | ||
T = get_T(s, s.step) | ||
s.is_training && (s.step += 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain : && (s.step += 1)
Co-authored-by: Tom Marty <59280588+3rdCore@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can merge this branch into master now.
This PR introduces a new explorer (softmax with temperature decay and UCB explorer) to test new exploring strategies for our RL agents.
It also adds a util which allows to plot tripartite graphs after they are initialized.