Add custom head_dim support to Llama #32502

suhara · 2024-08-07T18:22:43Z

What does this PR do?

Llama assumes that head_dim * num_heads == hidden_size and does not accommodate any models with custom head_dim size. This PR relaxes the assumption and makes Llama use custom head_dim sizes.

This PR has a dependency on the following PR:

Use head_dim if in config for RoPE #32495
- src/transformers/modeling_rope_utils.py to use config.head_dim for RoPE.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

amyeroberts · 2024-08-07T20:37:26Z

src/transformers/models/llama/modeling_llama.py

+        if config.head_dim is None:
+            if (self.head_dim * self.num_heads) != self.hidden_size:
+                raise ValueError(
+                    f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+                    f" and `num_heads`: {self.num_heads})."
+                )


Suggested change

if config.head_dim is None:

if (self.head_dim * self.num_heads) != self.hidden_size:

raise ValueError(

f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"

f" and `num_heads`: {self.num_heads})."

)

if config.head_dim is None and (self.head_dim * self.num_heads) != self.hidden_size:

raise ValueError(

f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"

f" and `num_heads`: {self.num_heads})."

)

@amyeroberts Thanks for the suggestion! Updated the if block accordingly.

amyeroberts · 2024-08-07T20:37:36Z

cc @ArthurZucker

ArthurZucker

Hey! not sure we needs this (what I meant by a regression is that I thought we did allow head dim for llama) other models had this constraint lifted liike gemma I think

src/transformers/models/llama/modeling_llama.py

suhara · 2024-08-08T15:58:14Z

Hi @ArthurZucker

The motivation is that some custom Llama-architecture based models with custom head_dim sizes cannot be loaded by LlamaModel due to this constraint. (Some context: NVIDIA/NeMo#10078)

That said, I understand your concern on the regression issue. What would be your suggestion? If the existing Llama class is supposed to support official Llama models, creating a new class to cover custom Llama-based variant models would be an option?

ArthurZucker

Suggestions should fix the CI let's go with this.
Sorry for the delayed reviewed I was OOO for a bit

ArthurZucker · 2024-08-15T15:09:43Z

src/transformers/models/llama/configuration_llama.py

@@ -187,6 +191,7 @@ def __init__(
        self.attention_bias = attention_bias
        self.attention_dropout = attention_dropout
        self.mlp_bias = mlp_bias
+        self.head_dim = head_dim


what you can do here is if head_dim is None: self.head_dim = self.hidden_size // self.num_heads

Thanks for the suggestion! Added.

src/transformers/models/llama/modeling_llama.py

ArthurZucker · 2024-08-16T09:23:47Z

Cool can you just run make fixup and make fix-copies

HuggingFaceDocBuilderDev · 2024-08-16T09:41:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

suhara · 2024-08-16T21:59:20Z

@ArthurZucker It seems that you created another PR and fix the remaining issues for this. Thank you!

suhara · 2024-08-17T05:46:34Z

Hi @ArthurZucker

I saw @Qubitium 's message in #32857. The newly created PR is missing the fix for o_proj.
I fixed the CI issues nd this PR should be ready to merge. Can you check?

Thanks!

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

suhara · 2024-08-19T07:27:51Z

Hi @ArthurZucker

The CI passed. Can you merge this PR (or #32857 after fixing the issue)? Thanks!

suhara · 2024-08-20T16:55:45Z

#32857 has been merged. Close this PR.

This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) [ghstack-poisoned]

This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) ghstack-source-id: 253658231 Pull Request resolved: #6872

Pull Request resolved: #6872 This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) ghstack-source-id: 254171929

This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) [ghstack-poisoned]

Pull Request resolved: #6872 This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 ghstack-source-id: 254176606 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/)

This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) [ghstack-poisoned]

Pull Request resolved: #6872 This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 ghstack-source-id: 254190233 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/)

Pull Request resolved: #6872 This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 ghstack-source-id: 255340016 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/)

This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) [ghstack-poisoned]

Pull Request resolved: #6872 This is for resolving the ask in this [post](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1574875706716050/). Similar change in HF: huggingface/transformers#32502 ghstack-source-id: 255340016 Differential Revision: [D65974454](https://our.internmc.facebook.com/intern/diff/D65974454/) Co-authored-by: Lunwen He <lwhecser@gmail.com>

amyeroberts reviewed Aug 7, 2024

View reviewed changes

suhara mentioned this pull request Aug 7, 2024

Add kv_channels to convert_llama_nemo_to_hf.py NVIDIA/NeMo#10078

Closed

8 tasks

suhara marked this pull request as ready for review August 7, 2024 22:09

ArthurZucker reviewed Aug 8, 2024

View reviewed changes

src/transformers/models/llama/modeling_llama.py Outdated Show resolved Hide resolved

ArthurZucker reviewed Aug 15, 2024

View reviewed changes

ArthurZucker mentioned this pull request Aug 16, 2024

Allow-head-dim #32857

Merged

huanglizhuo mentioned this pull request Aug 18, 2024

[Question] how to run Llama-3.1-Minitron-4B-Width-Base mlc-ai/mlc-llm#2820

Closed

suhara and others added 12 commits August 18, 2024 11:42

Remove head_dim * num_heads == hidden_size assertion

a7de6a1

Add kv_channels to LlamaConfig

94dd7a0

Fix o_proj

2fa753a

Fix o_proj

b7569b7

Rename kv_channels->head_dim

599e572

Merge nested if into single if to address the comment

6718006

Update src/transformers/models/llama/modeling_llama.py

5d79f68

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Replace config.get() with getattr() to fix AttributeError

904090c

Update src/transformers/models/llama/modeling_llama.py

fccaf7b

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Set default value if head_dim is not defined in config.json

87edd5e

Run make fixup; make fix-copies

416d72a

Rebased; remove unnecessary assertion block

06cc89d

suhara force-pushed the suhara/llama-kv-channels branch from e0af552 to 06cc89d Compare August 18, 2024 18:55

suhara closed this Aug 20, 2024

helunwencser mentioned this pull request Nov 14, 2024

allow customized head_dim pytorch/executorch#6872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom head_dim support to Llama #32502

Add custom head_dim support to Llama #32502

suhara commented Aug 7, 2024 •

edited

Loading

amyeroberts Aug 7, 2024

suhara Aug 7, 2024

amyeroberts commented Aug 7, 2024

ArthurZucker left a comment

suhara commented Aug 8, 2024

ArthurZucker left a comment

ArthurZucker Aug 15, 2024

suhara Aug 15, 2024

ArthurZucker commented Aug 16, 2024

HuggingFaceDocBuilderDev commented Aug 16, 2024

suhara commented Aug 16, 2024

suhara commented Aug 17, 2024

suhara commented Aug 19, 2024

suhara commented Aug 20, 2024

Add custom head_dim support to Llama #32502

Add custom head_dim support to Llama #32502

Conversation

suhara commented Aug 7, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

amyeroberts Aug 7, 2024

Choose a reason for hiding this comment

suhara Aug 7, 2024

Choose a reason for hiding this comment

amyeroberts commented Aug 7, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

suhara commented Aug 8, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Aug 15, 2024

Choose a reason for hiding this comment

suhara Aug 15, 2024

Choose a reason for hiding this comment

ArthurZucker commented Aug 16, 2024

HuggingFaceDocBuilderDev commented Aug 16, 2024

suhara commented Aug 16, 2024

suhara commented Aug 17, 2024

suhara commented Aug 19, 2024

suhara commented Aug 20, 2024

suhara commented Aug 7, 2024 •

edited

Loading