Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] CQL change hparams and data reading strategy #27451

Merged
merged 3 commits into from
Aug 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion release/release_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2739,7 +2739,7 @@

cluster:
cluster_env: app_config.yaml
cluster_compute: 2gpus_32cpus.yaml
cluster_compute: 1gpu_16cpus.yaml

run:
timeout: 18000
Expand Down
21 changes: 21 additions & 0 deletions release/rllib_tests/1gpu_16cpus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
cloud_id: {{env["ANYSCALE_CLOUD_ID"]}}
region: us-west-2

max_workers: 0

head_node_type:
name: head_node
instance_type: g3.4xlarge

worker_node_types:
- name: worker_node
instance_type: m5.xlarge
min_workers: 0
max_workers: 0
use_spot: false

aws:
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 500
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ cql-halfcheetahbulletenv-v0:
pass_criteria:
evaluation/episode_reward_mean: 400.0
# Can not check throughput for offline methods.
# timesteps_total: 10000000
timesteps_total: 2500000
stop:
time_total_s: 3600
time_total_s: 1800
config:
# Use input produced by expert SAC algo.
input: ["~/halfcheetah_expert_sac.zip"]
input: "dataset"
input_config:
format: "json"
paths: "s3://air-example-data/rllib/half_cheetah/half_cheetah.json"
actions_in_input_normalized: true

soft_horizon: False
Expand All @@ -25,19 +28,18 @@ cql-halfcheetahbulletenv-v0:
no_done_at_end: false
n_step: 3
rollout_fragment_length: 1
replay_buffer_config:
type: MultiAgentReplayBuffer
learning_starts: 256
num_workers: 8
grad_clip: 40
train_batch_size: 256
target_network_update_freq: 0
min_train_timesteps_per_iteration: 1000
optimization:
actor_learning_rate: 0.0001
critic_learning_rate: 0.0003
entropy_learning_rate: 0.0001
num_workers: 0
num_gpus: 1
metrics_smoothing_episodes: 5
min_time_s_per_iteration: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary? This is offline RL so more number of iterations should have worked equally here. If this is necessary something does not make sense, if it wasn't necessary we should remove this hparam and instead increase timesteps_total to be consistent with other learning tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait this parameter controls the logging frequency. I increased it so that we don't run too many unnecessary evaluations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not just that, but also the training time spent per iteration. so you keep running the same training_step() function until that timing requirement is met. In other words you keep taking gradient updates until that time is reached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted offline and my concern was more about reproducibility. These nits are not merge blockers. So please merge if it's the only thing holding this back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will open a separate pr that addresses this issue across all release tests and environments


# CQL Configs
min_q_weight: 5.0
Expand Down