Make policies compatible with other/multiple image keys #149

alexander-soare · 2024-05-08T11:48:47Z

What this does

Makes all policies compatible with any key starting with "observation.image" (Eg: Diffusion Policy does not work with the ALOHA datasets because they use "observation.images.top" as their image key.)

Here's the design approach:

Use the input_shapes policy configuration parameters as the "source of truth" for the expected inputs.
Implicit logic: any key starting with "observation.image" is an image key.
Policies use 1 and 2 to create a image_keys attribute.
image_keys is used to unpack the batch in forward and select_action

As a side effect I'm also able to enable multiple image handling in ACT.

TODO: Update available_policies_per_env

How it was tested

CI tests were added for ACTPolicy/PushT and DiffusionPolicy/ALOHA. I did not add a test for TD-MPC as that also needs a "next.reward" key

For ACT I also tried stacking two of the same image int ACTPolicy._check_and_preprocess_batch to make sure it can handle multiple images.

lerobot/common/policies/diffusion/modeling_diffusion.py

alexander-soare · 2024-05-09T10:57:14Z

lerobot/common/policies/act/configuration_act.py

@@ -130,10 +130,3 @@ def __post_init__(self):
            raise ValueError(
                f"Multiple observation steps not handled yet. Got `nobs_steps={self.n_obs_steps}`"
            )
-        # Check that there is only one image.


FYI: This check is deleted as ACT can handle multiple images.

alexander-soare · 2024-05-09T10:58:56Z

lerobot/common/policies/act/modeling_act.py

-                  │       │     │     │                 │
-                inputs    └─────┼─────┘                 │
-                                │                       │
+                  │       │     │ └▲──▲─▲─┘             │


FYI: all changes here and below in this file were just me making the code more clear.

alexander-soare · 2024-05-09T11:01:36Z

tests/test_policies.py

@@ -72,6 +80,31 @@ def test_policy(env_name, policy_name, extra_overrides):
        + extra_overrides,
    )

+    # Additional config override logic.


FYI: This (and in general the idea of testing policies x datasets) scales as n². Not nice. I think we need to think of an O(N) solution like validating that datasets have a certain data key format, and policies can handle this format.

Cadene

Nice!

Cadene · 2024-05-16T12:05:03Z

lerobot/common/policies/diffusion/modeling_diffusion.py

+        image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]
+        # Note: This check is covered in the post-init of the config but have a sanity check just in case.
+        assert len(image_keys) == 1
+        self.input_image_key = image_keys[0]


Suggested change

image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]

# Note: This check is covered in the post-init of the config but have a sanity check just in case.

assert len(image_keys) == 1

self.input_image_key = image_keys[0]

image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]

# Note: This check is covered in the post-init of the config but have a sanity check just in case.

# TODO(alexander-soare): make diffusion compatible with multiple image keys

assert len(image_keys) == 1

self.input_image_key = image_keys[0]

Added in main docstring.

Cadene · 2024-05-16T12:05:41Z

lerobot/common/policies/tdmpc/modeling_tdmpc.py

+        image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]
+        # Note: This check is covered in the post-init of the config but have a sanity check just in case.
+        assert len(image_keys) == 1
+        self.input_image_key = image_keys[0]


Suggested change

image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]

# Note: This check is covered in the post-init of the config but have a sanity check just in case.

assert len(image_keys) == 1

self.input_image_key = image_keys[0]

image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]

# Note: This check is covered in the post-init of the config but have a sanity check just in case.

# TODO(alexander-soare): make diffusion compatible with multiple image keys

assert len(image_keys) == 1

self.input_image_key = image_keys[0]

or raise NotImplementedError

Joeland4 · 2024-05-16T14:01:41Z

Nice!

Thanks a lot, @Cadene @alexander-soare still a problem when make_datasets of Aloha_xxx while using diffusion policy, seems like the key 'ovservation.image' in diffusion.yaml is not consistent with that in datasets.

script:
python lerobot/scripts/train.py policy=diffusion env=aloha env.task=AlohaInsertion-v0 dataset_repo_id=lerobot/aloha_sim_insertion_human

error:
s/lerobot/common/datasets/factory.py", line 53, in make_dataset
dataset.stats[key][stats_type] = torch.tensor(stats, dtype=torch.float32)
KeyError: 'observation.image'

I'd like to offer improvements, but I'm new to code, I'll try my best and hope you experts can provide a nice improvement.

alexander-soare · 2024-05-16T14:12:12Z

@Joeland4 my next task is to create an example/tutorial on how to adapt the config. Here's one that should work for you in the meantime (as in it won't raise an exception):

# @package _global_

# Defaults for training for the PushT dataset as per https://github.com/real-stanford/diffusion_policy.
# Note: We do not track EMA model weights as we discovered it does not improve the results. See
#       https://github.com/huggingface/lerobot/pull/134 for more details.

seed: 100000
dataset_repo_id: lerobot/aloha_sim_transfer_cube_human

training:
  offline_steps: 200000
  online_steps: 0
  eval_freq: 5000
  save_freq: 5000
  log_freq: 250
  save_model: true

  batch_size: 64
  grad_clip_norm: 10
  lr: 1.0e-4
  lr_scheduler: cosine
  lr_warmup_steps: 500
  adam_betas: [0.95, 0.999]
  adam_eps: 1.0e-8
  adam_weight_decay: 1.0e-6
  online_steps_between_rollouts: 1
  
  delta_timestamps:
    observation.images.top: "[i / ${fps} for i in range(1 - ${policy.n_obs_steps}, 1)]"
    observation.state: "[i / ${fps} for i in range(1 - ${policy.n_obs_steps}, 1)]"
    action: "[i / ${fps} for i in range(1 - ${policy.n_obs_steps}, 1 - ${policy.n_obs_steps} + ${policy.horizon})]"

eval:
  n_episodes: 50
  batch_size: 50


policy:
  name: diffusion

  # Input / output structure.
  n_obs_steps: 2
  horizon: 16
  n_action_steps: 8

  input_shapes:
    # TODO(rcadene, alexander-soare): add variables for height and width from the dataset/env?
    observation.images.top: [3, 480, 640]
    observation.state: ["${env.state_dim}"]
  output_shapes:
    action: ["${env.action_dim}"]

  # Normalization / Unnormalization
  input_normalization_modes:
    observation.images.top: mean_std
    observation.state: min_max
  output_normalization_modes:
    action: min_max

  # Architecture / modeling.
  # Vision backbone.
  vision_backbone: resnet18
  crop_shape: [420, 420]
  crop_is_random: True
  pretrained_backbone_weights: null
  use_group_norm: True
  spatial_softmax_num_keypoints: 32
  # Unet.
  down_dims: [512, 1024, 2048]
  kernel_size: 5
  n_groups: 8
  diffusion_step_embed_dim: 128
  use_film_scale_modulation: True
  # Noise scheduler.
  num_train_timesteps: 100
  beta_schedule: squaredcos_cap_v2
  beta_start: 0.0001
  beta_end: 0.02
  prediction_type: epsilon # epsilon / sample
  clip_sample: True
  clip_sample_range: 1.0

  # Inference
  num_inference_steps: 100

  # Loss computation
  do_mask_loss_for_padding: false

And here's the script I use to run it.

DATASET=aloha_sim_transfer_cube_human
NAME=diffusion_$DATASET

python lerobot/scripts/train.py \
    hydra.run.dir=outputs/train/$NAME \
    hydra.job.name=$NAME \
    env=aloha \
    env.task=AlohaTransferCube-v0 \
    dataset_repo_id=lerobot/$DATASET \
    policy=diffusion_aloha \
    training.save_model=true \
    training.offline_steps=200000 \
    training.save_freq=20000 \
    training.eval_freq=10000 \
    eval.n_episodes=50 \
    wandb.enable=false \
    wandb.disable_artifact=true \
    device=cuda \

nnop · 2024-11-12T08:34:36Z

How did you fuse multiple images? Is there any technical reference for this?

first draft

8357dae

alexander-soare requested a review from Cadene May 8, 2024 11:48

alexander-soare marked this pull request as draft May 8, 2024 11:56

don't delete observation.image key

c3a5cbb

Cadene reviewed May 8, 2024

View reviewed changes

lerobot/common/policies/diffusion/modeling_diffusion.py Outdated Show resolved Hide resolved

alexander-soare commented May 8, 2024

View reviewed changes

lerobot/common/policies/diffusion/modeling_diffusion.py Outdated Show resolved Hide resolved

alexander-soare added 4 commits May 8, 2024 18:24

backup wip

5580b51

Merge remote-tracking branch 'upstream/main' into policy_compatibility

4b4f922

backup wip

e80fc1d

enable multiple images for ACT

158627d

alexander-soare changed the title ~~Make Diffusion Policy compatible with other image keys~~ Make policies compatible with other/multiple image keys May 9, 2024

ready for review

4cfebf1

alexander-soare commented May 9, 2024

View reviewed changes

call self.reset() in ACT

2ea8ad4

alexander-soare commented May 9, 2024

View reviewed changes

alexander-soare marked this pull request as ready for review May 9, 2024 11:02

Merge remote-tracking branch 'upstream/main' into policy_compatibility

4ae8d61

alexander-soare force-pushed the policy_compatibility branch from 0e4553b to 4ae8d61 Compare May 10, 2024 06:08

fix test

43e1c62

aliberts mentioned this pull request May 14, 2024

Diffusion policy can't be applied to other tasks？ #183

Closed

2 tasks

Merge remote-tracking branch 'upstream/main' into policy_compatibility

380c313

Cadene approved these changes May 16, 2024

View reviewed changes

revision

461c48a

alexander-soare merged commit 68c1b13 into huggingface:main May 16, 2024
5 checks passed

alexander-soare deleted the policy_compatibility branch May 16, 2024 12:51

alexander-soare mentioned this pull request May 27, 2024

Multi-image support for Diffusion Policy #212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make policies compatible with other/multiple image keys #149

Make policies compatible with other/multiple image keys #149

alexander-soare commented May 8, 2024 •

edited

Loading

alexander-soare May 9, 2024

alexander-soare May 9, 2024

alexander-soare May 9, 2024 •

edited

Loading

Cadene left a comment

Cadene May 16, 2024

alexander-soare May 16, 2024

Cadene May 16, 2024

Cadene May 16, 2024

alexander-soare May 16, 2024

Joeland4 commented May 16, 2024

alexander-soare commented May 16, 2024

nnop commented Nov 12, 2024

Make policies compatible with other/multiple image keys #149

Make policies compatible with other/multiple image keys #149

Conversation

alexander-soare commented May 8, 2024 • edited Loading

What this does

How it was tested

alexander-soare May 9, 2024

Choose a reason for hiding this comment

alexander-soare May 9, 2024

Choose a reason for hiding this comment

alexander-soare May 9, 2024 • edited Loading

Choose a reason for hiding this comment

Cadene left a comment

Choose a reason for hiding this comment

Cadene May 16, 2024

Choose a reason for hiding this comment

alexander-soare May 16, 2024

Choose a reason for hiding this comment

Cadene May 16, 2024

Choose a reason for hiding this comment

Cadene May 16, 2024

Choose a reason for hiding this comment

alexander-soare May 16, 2024

Choose a reason for hiding this comment

Joeland4 commented May 16, 2024

alexander-soare commented May 16, 2024

nnop commented Nov 12, 2024

alexander-soare commented May 8, 2024 •

edited

Loading

alexander-soare May 9, 2024 •

edited

Loading