error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta'). #32187

kimsan0622 · 2024-07-24T12:19:36Z

System Info

transformers version: 4.44.0.dev0
Platform: Linux-5.4.0-131-generic-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.24.1
Safetensors version: 0.4.3
Accelerate version: 0.31.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA A100-SXM4-80GB

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

code

llama_config = LlamaConfig.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
with torch.device("meta"):
    model = LlamaForCausalLM(llama_config)

error messages

[rank3]: Traceback (most recent call last):                                                                                                 
[rank3]:   File "/mnt/nfs4/nlp/ft_llms/finetuning.py", line 299, in <module>                                                                
[rank3]:     fire.Fire(main)                                                                                                                
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 143, in Fire                                                   
[rank3]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)                                                      
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire                                                  
[rank3]:     component, remaining_args = _CallAndUpdateTrace(                                                                               
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace                                    
[rank3]:     component = fn(*varargs, **kwargs)                                                                                             
[rank3]:   File "/mnt/nfs4/nlp/ft_llms/finetuning.py", line 123, in main                                                                    
[rank3]:     model = LlamaForCausalLM(llama_config)                                                                                         
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1067, in __init__               
[rank3]:     self.model = LlamaModel(config)                                                                                                
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 846, in __init__                
[rank3]:     [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]                                        
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 846, in <listcomp>              
[rank3]:     [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]                                        
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 633, in __init__                
[rank3]:     self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)                      
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 306, in __init__                
[rank3]:     self.rotary_emb = LlamaRotaryEmbedding(config=self.config)                                                                     
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 119, in __init__                
[rank3]:     inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device, **self.rope_kwargs)                                  
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 330, in _compute_llama3_parameters      
[rank3]:     if wavelen < high_freq_wavelen:                                                                                                
[rank3]:   File "/opt/conda/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in __torch_function__                            
[rank3]:     return func(*args, **kwargs)                                                                                                   
[rank3]: NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no abstract impl or
 Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); 
in order to use this operator with those APIs you'll need to add an abstract impl.Please see the following doc for next steps: https://docs.
google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

Expected behavior

We expect that llama3 will be successfully generated on PyTorch's meta device type as well.

In the process of creating the rope embedding for llama3.1, the _compute_llama3_parameters function from modeling_repe_utils.py is called. However, this function does not work correctly on the meta device type.
During this process, a NotImplementedError occurs as follows.

NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no abstract impl or
 Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); 
in order to use this operator with those APIs you'll need to add an abstract impl.Please see the following doc for next steps: https://docs.
google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

The related commit hash was merged one day ago, and the commit hash is as follows.
d5a99df

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-07-26T09:48:45Z

Indeed thanks for reporting. cc @gante as well

gante · 2024-07-26T10:46:44Z

@kimsan0622 thank you for reporting! 🤗

#32244 should fix it 🙌

kimsan0622 added the bug label Jul 24, 2024

gante mentioned this issue Jul 26, 2024

Llama 3.1: replace for loop by tensor ops at inv_freq initialization #32244

Merged

gante closed this as completed in #32244 Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta'). #32187

error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta'). #32187

kimsan0622 commented Jul 24, 2024

ArthurZucker commented Jul 26, 2024

gante commented Jul 26, 2024

error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta'). #32187

error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta'). #32187

Comments

kimsan0622 commented Jul 24, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Jul 26, 2024

gante commented Jul 26, 2024