Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta'). #32187

Closed
4 tasks
kimsan0622 opened this issue Jul 24, 2024 · 2 comments · Fixed by #32244
Labels

Comments

@kimsan0622
Copy link

System Info

  • transformers version: 4.44.0.dev0
  • Platform: Linux-5.4.0-131-generic-x86_64-with-glibc2.35
  • Python version: 3.10.14
  • Huggingface_hub version: 0.24.1
  • Safetensors version: 0.4.3
  • Accelerate version: 0.31.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA A100-SXM4-80GB

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

code

llama_config = LlamaConfig.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
with torch.device("meta"):
    model = LlamaForCausalLM(llama_config)

error messages

[rank3]: Traceback (most recent call last):                                                                                                 
[rank3]:   File "/mnt/nfs4/nlp/ft_llms/finetuning.py", line 299, in <module>                                                                
[rank3]:     fire.Fire(main)                                                                                                                
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 143, in Fire                                                   
[rank3]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)                                                      
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire                                                  
[rank3]:     component, remaining_args = _CallAndUpdateTrace(                                                                               
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace                                    
[rank3]:     component = fn(*varargs, **kwargs)                                                                                             
[rank3]:   File "/mnt/nfs4/nlp/ft_llms/finetuning.py", line 123, in main                                                                    
[rank3]:     model = LlamaForCausalLM(llama_config)                                                                                         
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1067, in __init__               
[rank3]:     self.model = LlamaModel(config)                                                                                                
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 846, in __init__                
[rank3]:     [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]                                        
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 846, in <listcomp>              
[rank3]:     [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]                                        
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 633, in __init__                
[rank3]:     self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)                      
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 306, in __init__                
[rank3]:     self.rotary_emb = LlamaRotaryEmbedding(config=self.config)                                                                     
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 119, in __init__                
[rank3]:     inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device, **self.rope_kwargs)                                  
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 330, in _compute_llama3_parameters      
[rank3]:     if wavelen < high_freq_wavelen:                                                                                                
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in __torch_function__                            
[rank3]:     return func(*args, **kwargs)                                                                                                   
[rank3]: NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no abstract impl or
 Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); 
in order to use this operator with those APIs you'll need to add an abstract impl.Please see the following doc for next steps: https://docs.
google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit 

Expected behavior

We expect that llama3 will be successfully generated on PyTorch's meta device type as well.

In the process of creating the rope embedding for llama3.1, the _compute_llama3_parameters function from modeling_repe_utils.py is called. However, this function does not work correctly on the meta device type.
During this process, a NotImplementedError occurs as follows.

NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no abstract impl or
 Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); 
in order to use this operator with those APIs you'll need to add an abstract impl.Please see the following doc for next steps: https://docs.
google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

The related commit hash was merged one day ago, and the commit hash is as follows.
d5a99df

@kimsan0622 kimsan0622 added the bug label Jul 24, 2024
@ArthurZucker
Copy link
Collaborator

Indeed thanks for reporting. cc @gante as well

@gante
Copy link
Member

gante commented Jul 26, 2024

@kimsan0622 thank you for reporting! 🤗

#32244 should fix it 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants