Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix performance and functionality flaws in attention nets (via Trajectory view API). #11729

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
194 commits
Select commit Hold shift + click to select a range
e7f7e09
WIP.
sven1977 Oct 23, 2020
b03d49c
WIP.
sven1977 Oct 24, 2020
2467064
WIP.
sven1977 Oct 24, 2020
20e1dc6
WIP.
sven1977 Oct 25, 2020
dfd299c
WIP.
sven1977 Oct 26, 2020
88b17de
WIP.
sven1977 Oct 26, 2020
684360d
WIP.
sven1977 Oct 26, 2020
7121ded
WIP.
sven1977 Oct 26, 2020
a8f46b3
WIP.
sven1977 Oct 26, 2020
8940667
WIP.
sven1977 Oct 26, 2020
4747f27
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 26, 2020
2d0287e
WIP.
sven1977 Oct 26, 2020
b0be1f8
Merge branch 'trajectory_view_api_enable_by_default_for_all_simple' i…
sven1977 Oct 26, 2020
84bd8d5
WIP.
sven1977 Oct 26, 2020
940e8d8
WIP.
sven1977 Oct 26, 2020
97fb268
WIP.
sven1977 Oct 27, 2020
9d75a84
WIP.
sven1977 Oct 27, 2020
eec4bc5
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 27, 2020
a3aebdf
WIP.
sven1977 Oct 27, 2020
c824acc
WIP.
sven1977 Oct 27, 2020
5149aba
WIP.
sven1977 Oct 27, 2020
22af6bd
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 27, 2020
8a30aa9
WIP.
sven1977 Oct 27, 2020
ca50712
WIP.
sven1977 Oct 27, 2020
8380ccf
WIP.
sven1977 Oct 28, 2020
d4ce5c4
WIP.
sven1977 Oct 28, 2020
c0e979f
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 28, 2020
85cfe0a
WIP.
sven1977 Oct 28, 2020
ddd9847
WIP.
sven1977 Oct 29, 2020
5ff50c7
WIP.
sven1977 Oct 29, 2020
5a0e682
WIP.
sven1977 Oct 29, 2020
7bea113
WIP.
sven1977 Oct 29, 2020
d707cc6
WIP.
sven1977 Oct 29, 2020
73a07b5
WIP.
sven1977 Oct 29, 2020
e71814c
WIP.
sven1977 Oct 30, 2020
a8530cc
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 30, 2020
1ffa52c
WIP.
sven1977 Oct 30, 2020
ef75111
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 30, 2020
fbd3563
Merge branch 'trajectory_view_api_enable_by_default_for_all_simple' i…
sven1977 Oct 30, 2020
dd64a49
WIP.
sven1977 Oct 30, 2020
5b26223
WIP.
sven1977 Oct 30, 2020
3d5c567
Fix.
sven1977 Oct 30, 2020
a991869
Fix.
sven1977 Oct 30, 2020
14cc799
LINT.
sven1977 Oct 30, 2020
8bd7075
Fixes and LINT.
sven1977 Oct 30, 2020
0405552
WIP.
sven1977 Oct 31, 2020
4ddb792
WIP.
sven1977 Oct 31, 2020
0a0d646
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 31, 2020
77ebbd0
Merge branch 'fix_torch_tf_eager_compute_grads_for_rnns' into traject…
sven1977 Oct 31, 2020
a4ed42f
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Oct 31, 2020
593426d
Fix.
sven1977 Oct 31, 2020
e60ae06
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 2, 2020
b7a737d
WIP.
sven1977 Nov 2, 2020
be63f0e
WIP.
sven1977 Nov 2, 2020
67b80f0
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 2, 2020
5a7020c
Fixes.
sven1977 Nov 2, 2020
3ca31cd
Fixes.
sven1977 Nov 2, 2020
f9cd241
LINT and fixes.
sven1977 Nov 2, 2020
dad163c
Fix.
sven1977 Nov 2, 2020
0cb32a9
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 3, 2020
8ad714f
WIP.
sven1977 Nov 3, 2020
a00f663
Fix and remove ARS/ES again (follow-up PR).
sven1977 Nov 3, 2020
6355a2a
Fix and remove ARS/ES again (follow-up PR).
sven1977 Nov 3, 2020
eb05419
Fix APPO and DDPPO w/ traj. view API.
sven1977 Nov 3, 2020
f9ab364
WIP.
sven1977 Nov 3, 2020
2826753
LINT.
sven1977 Nov 3, 2020
14f0e22
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 4, 2020
27d2d2f
Fixes.
sven1977 Nov 4, 2020
f2d0be6
Fixes.
sven1977 Nov 4, 2020
5895d80
WIP.
sven1977 Nov 4, 2020
7a4a1ee
WIP.
sven1977 Nov 4, 2020
27eeec1
WIP.
sven1977 Nov 4, 2020
4ee34f7
LINT.
sven1977 Nov 4, 2020
658999b
WIP.
sven1977 Nov 5, 2020
cd94f6f
Fix.
sven1977 Nov 5, 2020
e3e5072
Fix.
sven1977 Nov 5, 2020
59aeac4
Fix.
sven1977 Nov 5, 2020
03b558c
Fix.
sven1977 Nov 5, 2020
1ed4fc1
WIP.
sven1977 Nov 5, 2020
cbc5e11
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 5, 2020
645149b
WIP.
sven1977 Nov 9, 2020
f7bd6e9
WIP.
sven1977 Nov 9, 2020
2786d7c
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 9, 2020
e1f9380
Fixes.
sven1977 Nov 9, 2020
6f44e44
WIP.
sven1977 Nov 10, 2020
59bac46
WIP.
sven1977 Nov 10, 2020
8515748
Fixes.
sven1977 Nov 10, 2020
47ce4bd
LINT and Fixes.
sven1977 Nov 10, 2020
1dd256f
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 10, 2020
fb135bc
WIP.
sven1977 Nov 11, 2020
5b6b73f
LINT and Fixes.
sven1977 Nov 11, 2020
7321701
Fixes.
sven1977 Nov 12, 2020
952ec40
Fixes.
sven1977 Nov 12, 2020
05b9aa9
Fixes.
sven1977 Nov 12, 2020
177296d
LINT.
sven1977 Nov 12, 2020
0f3908a
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 12, 2020
1e4def9
Merge branch 'trajectory_view_api_enable_by_default_for_some_tf' into…
sven1977 Nov 12, 2020
d35b2a0
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 12, 2020
7df6121
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 12, 2020
acd2076
LINT fixes.
sven1977 Nov 12, 2020
8f46ae3
LINT.
sven1977 Nov 12, 2020
dd3d0d4
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 12, 2020
f98de41
Fix 2 tests.
sven1977 Nov 13, 2020
07aad1c
LINT.
sven1977 Nov 13, 2020
f383319
WIP.
sven1977 Nov 13, 2020
2b34624
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 13, 2020
e6a142c
Clarifications and some undos.
sven1977 Nov 13, 2020
d3121bc
revert alpha-zero
sven1977 Nov 13, 2020
a1d448b
WIP.
sven1977 Nov 15, 2020
15d5016
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 16, 2020
ce181b4
WIP.
sven1977 Nov 16, 2020
f5396c1
WIP.
sven1977 Nov 16, 2020
b0642da
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 16, 2020
f95470e
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 16, 2020
8c94553
WIP.
sven1977 Nov 16, 2020
492763d
Merge branch 'trajectory_view_api_enable_by_default_for_sac_dqn_ddpg'…
sven1977 Nov 16, 2020
d2c7e5d
WIP.
sven1977 Nov 16, 2020
a6b9384
WIP.
sven1977 Nov 16, 2020
720f84b
WIP.
sven1977 Nov 16, 2020
313a585
WIP.
sven1977 Nov 16, 2020
b205910
WIP.
sven1977 Nov 17, 2020
b3977a0
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 17, 2020
896ac26
Merge branch 'sample_collector_fix_batch_size' into trajectory_view_a…
sven1977 Nov 17, 2020
066904a
WIP.
sven1977 Nov 17, 2020
7b24391
WIP.
sven1977 Nov 17, 2020
80d0495
WIP.
sven1977 Nov 17, 2020
9066adc
Merge branch 'sample_collector_fix_batch_size' into trajectory_view_a…
sven1977 Nov 17, 2020
a904b1c
WIP.
sven1977 Nov 17, 2020
c9641a2
WIP.
sven1977 Nov 17, 2020
b996fe1
WIP.
sven1977 Nov 17, 2020
4a87169
WIP.
sven1977 Nov 18, 2020
403fc01
Merge branch 'master' of https://github.com/ray-project/ray into curi…
sven1977 Nov 20, 2020
375ebbc
WIP.
sven1977 Nov 20, 2020
be21f32
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 23, 2020
f49cb94
Add PyBullet HalfCheetah test for SAC (tf and torch).
sven1977 Nov 23, 2020
216fd77
WIP.
sven1977 Nov 23, 2020
301f710
WIP.
sven1977 Nov 23, 2020
f4aa64c
WIP.
sven1977 Nov 24, 2020
48317fe
LEARNING, FAST VERSION!
sven1977 Nov 24, 2020
c0e1475
LINT, WIP.
sven1977 Nov 24, 2020
4d892cf
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 24, 2020
859ac16
LINT, fix.
sven1977 Nov 24, 2020
6f8ff16
WIP.
sven1977 Nov 24, 2020
847f73b
WIP.
sven1977 Nov 25, 2020
1e1ce1c
WIP.
sven1977 Nov 25, 2020
7d44b7f
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 26, 2020
b5a4bc1
WIP.
sven1977 Nov 26, 2020
0437680
Fix.
sven1977 Nov 26, 2020
f2227d3
WIP.
sven1977 Nov 26, 2020
86911d9
WIP.
sven1977 Nov 26, 2020
222f0a9
WIP.
sven1977 Nov 26, 2020
5e269c4
Fix.
sven1977 Nov 26, 2020
787810d
Fix.
sven1977 Nov 26, 2020
7113c32
WIP.
sven1977 Nov 26, 2020
2040c93
Merge branch 'attention_nets_prep_0' into attention_nets_prep_2
sven1977 Nov 26, 2020
f79faf7
Fixes and LINT.
sven1977 Nov 26, 2020
9320694
Merge branch 'master' of https://github.com/ray-project/ray into atte…
sven1977 Nov 27, 2020
b5a31b3
Merge branch 'master' of https://github.com/ray-project/ray into atte…
sven1977 Nov 27, 2020
045a6f2
Merge branch 'attention_nets_prep_2' into attention_nets_prep_3
sven1977 Nov 27, 2020
1d8fb50
Fixes and LINT.
sven1977 Nov 27, 2020
6769a2b
WIP.
sven1977 Nov 27, 2020
7176066
Fixes and LINT.
sven1977 Nov 27, 2020
ab077f6
WIP.
sven1977 Nov 27, 2020
bc084a2
Merge branch 'master' of https://github.com/ray-project/ray into atte…
sven1977 Nov 28, 2020
7241d82
Merge branch 'master' of https://github.com/ray-project/ray into atte…
sven1977 Nov 30, 2020
26839ba
merge
sven1977 Nov 30, 2020
9178a8c
Merge branch 'attention_nets_prep_2' into attention_nets_prep_3
sven1977 Nov 30, 2020
852b8e7
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Nov 30, 2020
0cd1dc5
Merge branch 'attention_nets_prep_3' into trajectory_view_api_attenti…
sven1977 Nov 30, 2020
b2cf88c
WIP.
sven1977 Nov 30, 2020
730f917
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 7, 2020
465ef0a
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 7, 2020
f5d8f3f
WIP.
sven1977 Dec 7, 2020
c77d09a
WIP.
sven1977 Dec 7, 2020
821448c
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 7, 2020
eaff083
WIP.
sven1977 Dec 7, 2020
0be8b71
WIP.
sven1977 Dec 7, 2020
d6ebefc
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 7, 2020
4872da6
WIP.
sven1977 Dec 7, 2020
331d7e4
WIP.
sven1977 Dec 7, 2020
6533c6e
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 8, 2020
83855b6
Fix.
sven1977 Dec 8, 2020
30de704
Fix.
sven1977 Dec 8, 2020
c442ffd
Fix.
sven1977 Dec 8, 2020
927dd14
Fix.
sven1977 Dec 8, 2020
21da0fd
WIP.
sven1977 Dec 8, 2020
d5351bd
WIP.
sven1977 Dec 8, 2020
68982de
merge
sven1977 Dec 9, 2020
2b00d76
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 10, 2020
b9b3120
WIP.
sven1977 Dec 10, 2020
e48cb0b
WIP.
sven1977 Dec 10, 2020
9b9067a
Fix.
sven1977 Dec 10, 2020
5671d3f
WIP.
sven1977 Dec 10, 2020
9df2a1b
Merge branch 'master' of https://github.com/ray-project/ray into traj…
sven1977 Dec 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 28 additions & 40 deletions rllib/agents/ppo/ppo_tf_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,22 +193,25 @@ def postprocess_ppo_gae(
last_r = 0.0
# Trajectory has been truncated -> last r=VF estimate of last obs.
else:
# Input dict is provided to us automatically via the Model's
# requirements. It's a single-timestep (last one in trajectory)
# input_dict.
if policy.config["_use_trajectory_view_api"]:
# Create an input dict according to the Model's requirements.
input_dict = policy.model.get_input_dict(sample_batch, index=-1)
last_r = policy._value(**input_dict)
# TODO: (sven) Remove once trajectory view API is all-algo default.
state_in_view_req = policy.model.inference_view_requirements.get(
"state_in_0")
# Attention net.
if state_in_view_req and state_in_view_req.shift_from is not None:
next_state = []
for i in range(policy.num_state_tensors()):
view_req = policy.model.inference_view_requirements.get(
"state_in_{}".format(i))
next_state.append(sample_batch["state_out_{}".format(i)][
view_req.shift_from:view_req.shift_to + 1])
# Everything else.
else:
next_state = []
for i in range(policy.num_state_tensors()):
next_state.append(sample_batch["state_out_{}".format(i)][-1])
last_r = policy._value(sample_batch[SampleBatch.NEXT_OBS][-1],
sample_batch[SampleBatch.ACTIONS][-1],
sample_batch[SampleBatch.REWARDS][-1],
*next_state)
last_r = policy._value(sample_batch[SampleBatch.NEXT_OBS][-1],
sample_batch[SampleBatch.ACTIONS][-1],
sample_batch[SampleBatch.REWARDS][-1],
*next_state)

# Adds the policy logits, VF preds, and advantages to the batch,
# using GAE ("generalized advantage estimation") or not.
Expand Down Expand Up @@ -303,34 +306,19 @@ def __init__(self, obs_space, action_space, config):
# observation.
if config["use_gae"]:

# Input dict is provided to us automatically via the Model's
# requirements. It's a single-timestep (last one in trajectory)
# input_dict.
if config["_use_trajectory_view_api"]:

@make_tf_callable(self.get_session())
def value(**input_dict):
model_out, _ = self.model.from_batch(
input_dict, is_training=False)
# [0] = remove the batch dim.
return self.model.value_function()[0]

# TODO: (sven) Remove once trajectory view API is all-algo default.
else:

@make_tf_callable(self.get_session())
def value(ob, prev_action, prev_reward, *state):
model_out, _ = self.model({
SampleBatch.CUR_OBS: tf.convert_to_tensor([ob]),
SampleBatch.PREV_ACTIONS: tf.convert_to_tensor(
[prev_action]),
SampleBatch.PREV_REWARDS: tf.convert_to_tensor(
[prev_reward]),
"is_training": tf.convert_to_tensor([False]),
}, [tf.convert_to_tensor([s]) for s in state],
tf.convert_to_tensor([1]))
# [0] = remove the batch dim.
return self.model.value_function()[0]
@make_tf_callable(self.get_session())
def value(ob, prev_action, prev_reward, *state):
model_out, _ = self.model({
SampleBatch.CUR_OBS: tf.convert_to_tensor([ob]),
SampleBatch.PREV_ACTIONS: tf.convert_to_tensor(
[prev_action]),
SampleBatch.PREV_REWARDS: tf.convert_to_tensor(
[prev_reward]),
"is_training": tf.convert_to_tensor([False]),
}, [tf.convert_to_tensor([s]) for s in state],
tf.convert_to_tensor([1]))
# [0] = remove the batch dim.
return self.model.value_function()[0]

# When not doing GAE, we do not require the value function's output.
else:
Expand Down
46 changes: 16 additions & 30 deletions rllib/agents/ppo/ppo_torch_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,36 +210,22 @@ def __init__(self, obs_space, action_space, config):
# When doing GAE, we need the value function estimate on the
# observation.
if config["use_gae"]:
# Input dict is provided to us automatically via the Model's
# requirements. It's a single-timestep (last one in trajectory)
# input_dict.
if config["_use_trajectory_view_api"]:

def value(**input_dict):
model_out, _ = self.model.from_batch(
convert_to_torch_tensor(input_dict, self.device),
is_training=False)
# [0] = remove the batch dim.
return self.model.value_function()[0]

# TODO: (sven) Remove once trajectory view API is all-algo default.
else:

def value(ob, prev_action, prev_reward, *state):
model_out, _ = self.model({
SampleBatch.CUR_OBS: convert_to_torch_tensor(
np.asarray([ob]), self.device),
SampleBatch.PREV_ACTIONS: convert_to_torch_tensor(
np.asarray([prev_action]), self.device),
SampleBatch.PREV_REWARDS: convert_to_torch_tensor(
np.asarray([prev_reward]), self.device),
"is_training": False,
}, [
convert_to_torch_tensor(np.asarray([s]), self.device)
for s in state
], convert_to_torch_tensor(np.asarray([1]), self.device))
# [0] = remove the batch dim.
return self.model.value_function()[0]

def value(ob, prev_action, prev_reward, *state):
model_out, _ = self.model({
SampleBatch.CUR_OBS: convert_to_torch_tensor(
np.asarray([ob]), self.device),
SampleBatch.PREV_ACTIONS: convert_to_torch_tensor(
np.asarray([prev_action]), self.device),
SampleBatch.PREV_REWARDS: convert_to_torch_tensor(
np.asarray([prev_reward]), self.device),
"is_training": False,
}, [
convert_to_torch_tensor(np.asarray([s]), self.device)
for s in state
], convert_to_torch_tensor(np.asarray([1]), self.device))
# [0] = remove the batch dim.
return self.model.value_function()[0]

# When not doing GAE, we do not require the value function's output.
else:
Expand Down
Loading