[rllib] Modularize policy graph and trainer construction #4788

ericl · 2019-05-15T16:32:27Z

Describe the problem

A couple improvements could be made to make it easier to customize policy graphs and trainers, without needing to directly modify the RLlib source code. This would be inline with the example here (but also including a builder for the policy graph itself): https://gist.github.com/ericl/0d3502f204c7612a429bfd3c3aba0307

For example:

PPOPolicyGraph = build_tf_policy_graph(
   model, loss_inputs, loss, ...?)

PPOTrainer = build_trainer(
    "PPO",
    default_config=DEFAULT_CONFIG,
    policy_graph=PPOPolicyGraph,
    make_optimizer=make_optimizer,
    validate_config=validate_config,
    after_optimizer_step=update_kl,
    before_train_step=warn_about_obs_filter,
    after_train_result=warn_about_bad_reward_scales)

We can also try to expose more of the loss input tensors to the Model class itself, so that custom losses can be defined without needing to modify the policy graph itself (though obviously more complex losses may still require changes).

The text was updated successfully, but these errors were encountered:

ericl self-assigned this May 15, 2019

ericl mentioned this issue May 16, 2019

[rllib] [RFC] Dynamic definition of loss functions and modularization support #4795

Merged

1 task

This was referenced Jun 3, 2019

[rllib] Port remainder of algorithms to build_trainer() pattern #4920

Merged

[rllib] Support eager execution in TF2 #4921

Closed

ericl mentioned this issue Jul 21, 2019

[rllib] Port DDPG to the build_tf_policy pattern #5242

Merged

1 task

ericl closed this as completed in #5242 Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Modularize policy graph and trainer construction #4788

[rllib] Modularize policy graph and trainer construction #4788

ericl commented May 15, 2019

[rllib] Modularize policy graph and trainer construction #4788

[rllib] Modularize policy graph and trainer construction #4788

Comments

ericl commented May 15, 2019

Describe the problem