Allow-head-dim #32857

ArthurZucker · 2024-08-16T18:56:30Z

What does this PR do?

Superseeds #32502 to support head dim in llama model for convenience

amyeroberts

LGTM - thanks!

HuggingFaceDocBuilderDev · 2024-08-16T19:16:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Qubitium · 2024-08-17T01:28:26Z

@ArthurZucker

PR #32502 has no problem but this PR has following stack with https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base

Traceback (most recent call last):
  File "/root/miniconda3/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/evaluator.py", line 198, in simple_evaluate
    lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/api/model.py", line 147, in create_from_arg_string
    return cls(**args, **args2)
           ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 184, in __init__
    self._create_model(
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 571, in _create_model
    self._model = self.AUTO_MODEL_CLASS.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3833, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1116, in __init__
    self.model = LlamaModel(config)
                 ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 902, in __init__
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 902, in <listcomp>
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 689, in __init__
    self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 351, in __init__
    raise ValueError(
ValueError: hidden_size must be divisible by num_heads (got `hidden_size`: 3072 and `num_heads`: 32).

commit f623b89c375f2a9d5d5a7166495003cf3d83ccdd
Author: Arthur Zucker <arthur.zucker@gmail.com>
Date:   Fri Aug 16 20:55:50 2024 +0200

suhara · 2024-08-17T03:23:24Z

The projection layer is not configured properly. This line is missing in this PR.

self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)

Please see #32502

https://github.com/huggingface/transformers/pull/32502/files#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51R359

in addition to the updated assertion block

suhara · 2024-08-17T05:48:03Z

Hi @ArthurZucker

Now #32502 passes the CI and should be ready to merge. Can you check?

bzantium · 2024-08-17T08:25:21Z

I think is it the same purpose what #32847 does.

Co-authored-by: Suhara <suhara@users.noreply.github.com>>

Co-authored-by: bzantium <bzantium@users.noreply.github.com>

Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>

ArthurZucker · 2024-08-19T13:02:27Z

@suhara and @bzantium let me add both of you as co-author and merge this one 😉

ArthurZucker added 3 commits August 16, 2024 20:55

support head dim

9d42c45

fix the doc

e7a094b

fixup

f623b89

ArthurZucker requested a review from amyeroberts August 16, 2024 18:58

amyeroberts approved these changes Aug 16, 2024

View reviewed changes

suhara mentioned this pull request Aug 17, 2024

Add custom head_dim support to Llama #32502

Closed

5 tasks

ArthurZucker and others added 4 commits August 19, 2024 14:54

add oproj

7476abd

Co-authored-by: Suhara <suhara@users.noreply.github.com>>

update

10f646a

Co-authored-by: bzantium <bzantium@users.noreply.github.com>

Co-authored-by: suhara <suhara@users.noreply.github.com>

92d1420

Update

687aee4

Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>

ArthurZucker merged commit 13e645b into main Aug 20, 2024
23 checks passed

ArthurZucker deleted the allow-head-dim branch August 20, 2024 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow-head-dim #32857

Allow-head-dim #32857

ArthurZucker commented Aug 16, 2024

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Aug 16, 2024

Qubitium commented Aug 17, 2024 •

edited

Loading

suhara commented Aug 17, 2024

suhara commented Aug 17, 2024

bzantium commented Aug 17, 2024

ArthurZucker commented Aug 19, 2024

Allow-head-dim #32857

Allow-head-dim #32857

Conversation

ArthurZucker commented Aug 16, 2024

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 16, 2024

Qubitium commented Aug 17, 2024 • edited Loading

suhara commented Aug 17, 2024

suhara commented Aug 17, 2024

bzantium commented Aug 17, 2024

ArthurZucker commented Aug 19, 2024

Qubitium commented Aug 17, 2024 •

edited

Loading