Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow-head-dim #32857

Merged
merged 7 commits into from
Aug 20, 2024
Merged

Allow-head-dim #32857

merged 7 commits into from
Aug 20, 2024

Conversation

ArthurZucker
Copy link
Collaborator

What does this PR do?

Superseeds #32502 to support head dim in llama model for convenience

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Qubitium
Copy link
Contributor

Qubitium commented Aug 17, 2024

@ArthurZucker

PR #32502 has no problem but this PR has following stack with https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base

Traceback (most recent call last):
  File "/root/miniconda3/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/evaluator.py", line 198, in simple_evaluate
    lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/api/model.py", line 147, in create_from_arg_string
    return cls(**args, **args2)
           ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 184, in __init__
    self._create_model(
  File "/root/miniconda3/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 571, in _create_model
    self._model = self.AUTO_MODEL_CLASS.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3833, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1116, in __init__
    self.model = LlamaModel(config)
                 ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 902, in __init__
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 902, in <listcomp>
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 689, in __init__
    self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 351, in __init__
    raise ValueError(
ValueError: hidden_size must be divisible by num_heads (got `hidden_size`: 3072 and `num_heads`: 32).
commit f623b89c375f2a9d5d5a7166495003cf3d83ccdd
Author: Arthur Zucker <arthur.zucker@gmail.com>
Date:   Fri Aug 16 20:55:50 2024 +0200

@suhara
Copy link
Contributor

suhara commented Aug 17, 2024

The projection layer is not configured properly. This line is missing in this PR.

self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)

Please see #32502

https://github.com/huggingface/transformers/pull/32502/files#diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51R359

in addition to the updated assertion block

@suhara
Copy link
Contributor

suhara commented Aug 17, 2024

Hi @ArthurZucker

Now #32502 passes the CI and should be ready to merge. Can you check?

@bzantium
Copy link
Contributor

I think is it the same purpose what #32847 does.

ArthurZucker and others added 4 commits August 19, 2024 14:54
Co-authored-by: Suhara
<suhara@users.noreply.github.com>>
Co-authored-by: bzantium <bzantium@users.noreply.github.com>
Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>
@ArthurZucker
Copy link
Collaborator Author

@suhara and @bzantium let me add both of you as co-author and merge this one 😉

@ArthurZucker ArthurZucker merged commit 13e645b into main Aug 20, 2024
23 checks passed
@ArthurZucker ArthurZucker deleted the allow-head-dim branch August 20, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants