Add Zamba2 #34517

pglorio · 2024-10-30T17:57:31Z

What does this PR do?

Please include support for Zamba2 architecture created by Zyphra Technologies.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

…into zamba2

Rebase zamba2

rebase

Rebase

pglorio · 2024-11-11T06:50:35Z

Hey @Arthur,

Thank you again for your help in getting Zamba2 into transformers! The PR is now finally ready to be reviewed. I added the documentation and all unit tests pass, including slow tests.

A few remarks, mostly related to modular transformers:

To generate modeling and configuration I used utils/modular_model_converter.py from a previous commit because the most recent version of this script that followed from a large refactoring produces an error that I was not able to fix:

Converting src/transformers/models/zamba2/modular_zamba2.py to a single model single file format
Traceback (most recent call last):
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1510, in <module>
    converted_files = convert_modular_file(file_name, args.old_model_name, args.new_model_name)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1447, in convert_modular_file
    for file, module in create_modules(cst_transformers).items():
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1387, in create_modules
    nodes_to_add, file_type, new_imports = get_class_node_and_dependencies(modular_mapper, class_name, node, files)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1337, in get_class_node_and_dependencies
    new_node_dependencies, new_imports = check_dependencies_and_create_import_node(
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in check_dependencies_and_create_import_node
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in <setcomp>
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
KeyError: 'Zamba2Config'

I carefully compared Zamba2Config with classes of other models that also use modular (such as Gemma2Config) and they appear to have consistent format. Relatedly, the utils/modular_model_converter.py in the current PR (path) is the version from the previous commit mentioned above.

After running utils/modular_model_converter.py, the modeling and configuration files generated contain unintended code that I had to update. All these modifications are in this commit. In particular, the produced modeling file contains Zamba2DynamicCache, which is the correct cache of Zamba2 as well as HybridMambaAttentionDynamicCache, which is the cache of Zamba and is not relevant to Zamba2, so I deleted HybridMambaAttentionDynamicCache and related references.
I ran make fixup and all zamba-related tests pass, with the exception of python utils/check_modular_conversion.py. This test doesn't pass due to the modifications mentioned in the previous point.
I slightly edited the Zamba2MambaMixer compared to the original Mamba2Mixer of mamba2, the main difference is that I added these lines, which was necessary to appropriately process the mamba2 cache (note this step already existed in the torch forward in these lines).

Looking forward to your feedback. Thanks so much!

src/transformers/models/zamba2/modular_zamba2.py

rebase on upstream

pglorio · 2025-01-14T08:28:45Z

Hi @Cyrilvallez and @ArthurZucker,

I updated the attention forward to the new standard of transformers here and here.

I ran all final tests, including @slow tests, and everything appears to pass!

Cyrilvallez

Nice work for the refactor! Almost ready, left some final comments but overall quite nice! 🤗

src/transformers/models/zamba2/modular_zamba2.py

Cyrilvallez · 2025-01-15T19:35:51Z

src/transformers/testing_utils.py

        "ZambaModelTester",
+        "Zamba2ModelTester",
        "RwkvModelTester",


cc @ydshieh here to ensure this change is necessary, as I'm not familiar with this new part!

@ydshieh for context, when running this test the config of the model is forced to have num_hidden_layers=1 but other parameters of the config are not updated accordingly so when the model is initialized it errors out as these params are not consistently updated. It's probably also the reason why Zamba was added to this list I imagine.

tests/models/zamba2/test_modeling_zamba2.py

pglorio · 2025-01-16T09:47:59Z

Thank you @Cyrilvallez for the review. I addressed the comments above, although there are a couple of pending points.

All zamba-related tests appear to pass.

pglorio · 2025-01-17T05:09:39Z

Hello @Cyrilvallez, I ran all model tests on two GPUs and after a couple of minor fixes everything appears to work now. I'm skipping this test as it gives an error related to mamba2 kernels. I indeed verified that mamba2 skips that test here.

Separately, when running utils/check_modular_conversion.py I get the following error:

Differences found between the generated code and src/transformers/models/zamba2/modeling_zamba2.py:

   1 --- src/transformers/models/zamba2/modeling_zamba2.py_generated
   2 +++ src/transformers/models/zamba2/modeling_zamba2.py
   3 @@ -313,6 +313,13 @@
   4      return attn_output, attn_weights
   5  
   6  
   7 +def rotate_half(x):
   8 +    """Rotates half the hidden dims of the input."""
   9 +    x1 = x[..., : x.shape[-1] // 2]
  10 +    x2 = x[..., x.shape[-1] // 2 :]
  11 +    return torch.cat((-x2, x1), dim=-1)
  12 +
  13 +
  14  def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
  15      """Applies Rotary Position Embedding to the query and key tensors.
  16  
  17 @@ -338,13 +345,6 @@
  18      q_embed = (q * cos) + (rotate_half(q) * sin)
  19      k_embed = (k * cos) + (rotate_half(k) * sin)
  20      return q_embed, k_embed
  21 -
  22 -
  23 -def rotate_half(x):
  24 -    """Rotates half the hidden dims of the input."""
  25 -    x1 = x[..., : x.shape[-1] // 2]
  26 -    x2 = x[..., x.shape[-1] // 2 :]
  27 -    return torch.cat((-x2, x1), dim=-1)

which I was not getting before despite this part was identical.

Cyrilvallez

LGTM! Let's just wait for #35795 which will get rid of the CI failure for modular conversion! Sorry about that, and thanks for being so patient with us 🙏🙏🤗
Great work!

pglorio · 2025-01-21T08:27:03Z

Awesome, sounds good!

ArthurZucker

Thanks! A few comments about the code paths, regex init and should be good!

ArthurZucker · 2025-01-21T09:17:27Z

docs/source/en/model_doc/zamba2.md

+Zamba2 requires you use `transformers` version 4.46.0 or higher:
+```bash
+pip install transformers>=4.46.0
+```


Suggested change

Zamba2 requires you use `transformers` version 4.46.0 or higher:

```bash

pip install transformers>=4.46.0

```

Zamba2 requires you use `transformers` version 4.48.0 or higher:

```bash

pip install transformers>=4.48.0

ArthurZucker · 2025-01-21T09:22:12Z

src/transformers/models/zamba2/modular_zamba2.py

+def layer_type_list(config: Zamba2Config):
+    """
+    Returns list of layer ids containing hybrid layers
+    """
+    output_list = []
+    for index, type in enumerate(config.layers_block_type):
+        if type == "hybrid":
+            output_list.append(index)
+    return output_list


I don't understand why we have this when we can simply store the explicit list in the config?

ArthurZucker · 2025-01-21T09:22:24Z

src/transformers/models/zamba2/modular_zamba2.py

+def count_mem_blocks_in_config(config: Zamba2Config):
+    """
+    Count number of shared blocks
+    """
+    num_gs = 0
+    for val in config.layers_block_type:
+        if val == "hybrid":
+            num_gs += 1


same for this + it's only used once, not sure it's worth doing this

ArthurZucker · 2025-01-21T09:24:21Z

src/transformers/models/zamba2/modular_zamba2.py

+        self.conv_states = {
+            i: torch.zeros(
+                batch_size,
+                self.intermediate_size + 2 * config.mamba_ngroups * config.mamba_d_state,
+                self.conv_kernel_size,
+                device=device,
+                dtype=dtype,
+            )
+            for i in range(config.num_hidden_layers)
+        }
+        self.ssm_states = {
+            i: torch.zeros(
+                batch_size, self.n_mamba_heads, config.mamba_headdim, self.ssm_state_size, device=device, dtype=dtype
+            )
+            for i in range(config.num_hidden_layers)
+        }
+        for i in range(config.num_hidden_layers):
+            if self.layers_block_type[i] == "hybrid":
+                self.transformer_layers.append(i)


a single for loop should suffice here

ArthurZucker · 2025-01-21T09:28:39Z

src/transformers/models/zamba2/modular_zamba2.py

+        self.self_attn = Zamba2Attention(config, layer_idx=-1, num_fwd_mem_blocks=num_gs, block_id=block_id)
+        self.feed_forward = Zamba2MLP(config, num_fwd_mem_blocks=num_gs, block_id=block_id)
+
+    def forward(


forward can be the same as ZambaAttentionDecoderLayer no?

we can prob remove the cache_positions as they are not used in both modeling

ArthurZucker · 2025-01-21T09:31:35Z

src/transformers/models/zamba2/modular_zamba2.py

+        self, shared_transformer: Zamba2AttentionDecoderLayer, linear: nn.Linear, mamba: Zamba2MambaDecoderLayer
+    ):
+        super().__init__(shared_transformer, linear, mamba)
+        del self.shared_transf


wow I wish I caught this when reviewing the original model 🤣

ArthurZucker · 2025-01-21T09:32:43Z

src/transformers/models/zamba2/modular_zamba2.py

+ZAMBA2_START_DOCSTRING = r"""
+    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
+    library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
+    etc.)
+
+    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
+    Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
+    and behavior.
+
+    Parameters:
+        config ([`Zamba2Config`]):
+            Model configuration class with all the parameters of the model. Initializing with a config file does not
+            load the weights associated with the model, only the configuration. Check out the
+            [`~PreTrainedModel.from_pretrained`] method to load the model weights.
+"""
+
+
+@add_start_docstrings(
+    "The bare Zamba2 Model outputting raw hidden-states without any specific head on top.",
+    ZAMBA2_START_DOCSTRING,
+)


pretty sure removing them can work with auto renaming!

ArthurZucker · 2025-01-21T09:33:49Z

src/transformers/models/zamba2/modular_zamba2.py

+@add_start_docstrings(
+    "The bare Zamba2 Model outputting raw hidden-states without any specific head on top.",
+    ZAMBA2_START_DOCSTRING,
+)


same, if they are the same input as zamba, you don't need to explicitly write these

ArthurZucker · 2025-01-21T09:34:23Z

src/transformers/models/zamba2/modular_zamba2.py

+                        "shared_transformer.pre_ff_layernorm.weight",
+                    ]
+                    self._tied_weights_keys = [*self._tied_weights_keys, *[prefix_name + key for key in tied_keys]]
+                    if self.config.use_shared_mlp_adapter:


same comment about code path, which models have this set to true / false?

tied key supports regex patter, we should never have to add all of themmanually like this

ArthurZucker · 2025-01-21T09:35:47Z

tests/models/zamba2/test_modeling_zamba2.py

+            , dtype=torch.float32)  # fmt: skip
+
+        torch.testing.assert_close(logits[0, -1, :40].cpu(), EXPECTED_LOGITS_NO_GRAD_0, rtol=1e-3, atol=1e-3)
+        torch.testing.assert_close(logits[1, -1, :40].cpu(), EXPECTED_LOGITS_NO_GRAD_1, rtol=1e-3, atol=1e-3)


It's missing a test on cpu with the sow forward!

pglorio added 6 commits October 24, 2024 05:33

First commit

acd25b7

Finish model implementation

70639b8

First commit

d111b98

Finish model implementation

8f36dba

Merge branch 'zamba2' of https://github.com/Zyphra/transformers_zamba …

f0c547c

…into zamba2

Register zamba2

700fbf0

pglorio marked this pull request as draft October 30, 2024 17:57

pglorio and others added 17 commits November 4, 2024 23:57

generated modeling and configuration

70a6021

Merge pull request #2 from Zyphra/main

88c4b26

Rebase zamba2

generated modeling and configuration

685906a

added hybrid cache

4da8d5f

fix attention_mask in mamba

6b5a9be

dropped unused loras

248350d

fix flash2

d1d2c66

Merge pull request #3 from Zyphra/main

eb6063e

rebase

config docstrings

5f5d01e

fix config and fwd pass

c1b7647

make fixup fixes

979b99b

text_modeling_zamba2

9d9b2eb

Merge pull request #4 from Zyphra/main

3a457f5

Rebase

small fixes

549d4cb

make fixup fixes

987bba9

Merge pull request #5 from Zyphra/main

ffc2a58

Rebase

Fix modular model converter

9adf85e

ArthurZucker reviewed Nov 14, 2024

View reviewed changes

src/transformers/models/zamba2/modular_zamba2.py Show resolved Hide resolved

added inheritances in modular, renamed zamba cache

904da4e

pglorio force-pushed the zamba2 branch from 6d20bf9 to 904da4e Compare November 19, 2024 06:28

pglorio and others added 2 commits November 19, 2024 01:06

Merge pull request #6 from Zyphra/main

4725983

rebase on upstream

modular rebase

0be27d7

pglorio added 2 commits December 20, 2024 02:36

make fixup fixes

4e40975

make fixup fixes

61bb32f

pglorio mentioned this pull request Jan 7, 2025

Zamba new attention standard #35375

Merged

5 tasks

pglorio added 5 commits January 7, 2025 19:47

fix merge conflicts

bb9b24b

update to new attention standard

cb90bb4

fixes for merge

8ed701e

update to new attention standard

1dbc8c7

make fixup fixes

f24e452

Cyrilvallez reviewed Jan 15, 2025

View reviewed changes

pglorio added 4 commits January 16, 2025 06:35

rebase

676f862

minor fixes

2b29338

cache_position

b212cb2

removed cache_position postion_ids use_cache

1e3b51e

pglorio added 8 commits January 16, 2025 19:58

remove config from modular

5ace701

removed config from modular (2)

535b631

rebase

5a16aa9

import apply_rotary_pos_emb from llama

1c92266

fixed rope_kwargs

99bde93

Instantiate cache in Zamba2Model

baf2ed3

fix cache

9afb57e

fix @slow decorator

d1687f9

rebase

4299889

Cyrilvallez approved these changes Jan 20, 2025

View reviewed changes

pglorio added 2 commits January 21, 2025 08:42

rebase

a0545bf

small fix in modular file

903f6dc

ArthurZucker reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zamba2 #34517

Add Zamba2 #34517

pglorio commented Oct 30, 2024

pglorio commented Nov 11, 2024

pglorio commented Jan 14, 2025 •

edited

Loading

Cyrilvallez left a comment

Cyrilvallez Jan 15, 2025

pglorio Jan 16, 2025

pglorio commented Jan 16, 2025

pglorio commented Jan 17, 2025

Cyrilvallez left a comment •

edited

Loading

pglorio commented Jan 21, 2025

ArthurZucker left a comment

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

ArthurZucker Jan 21, 2025

Add Zamba2 #34517

Are you sure you want to change the base?

Add Zamba2 #34517

Conversation

pglorio commented Oct 30, 2024

What does this PR do?

Who can review?

pglorio commented Nov 11, 2024

pglorio commented Jan 14, 2025 • edited Loading

Cyrilvallez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pglorio commented Jan 16, 2025

pglorio commented Jan 17, 2025

Cyrilvallez left a comment • edited Loading

Choose a reason for hiding this comment

pglorio commented Jan 21, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pglorio commented Jan 14, 2025 •

edited

Loading

Cyrilvallez left a comment •

edited

Loading