-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huawei NPU device_map=auto doesn't split model evenly over all devices #2368
Comments
You should set the |
Thanks for reply! set NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before import worked ! and another key point as we can see is it doesn't split weights evenly over all available device, is there any ideas about this? |
@lichangW Something seems to be wrong with torch_npu as it is not properly sending tensors to different devices |
cc @fakeYan |
hi, is there any update on this issue ^_^ @fakeYan @statelesshz |
Would be great to know if there's any progress on solving this issue... @statelesshz |
@lichangW 这个问题你解决了吗,谢谢 |
@junior-zsy @lichangW This issue will be fixed in the next version of torch_npu (at the latest in April) |
华为这个好像是不支持单进程多卡,改起来很烦,现在只能单张卡推理 |
Thanks for reply, I just tried the latest torch_npu version v6.0.rc1.alpha001-pytorch2.2.0 and the problem still exits. |
I'm working on inference import os
os.environ['NPU_VISIBLE_DEVICES']='0,1,2,3'
os.environ['ASCEND_RT_VISIBLE_DEVICES']='0,1,2,3'
import torch
import torch_npu
import accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, Accelerator, infer_auto_device_map
torch.npu.set_device(['npu:0','npu:1'])
model_path = "/path/to/01ai/Yi-34b-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
# error `AttributeError: module 'torch_npu.npu' has no attribute 'mem_get_info'`
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map={"npu:0":"60GiB","npu:1":"60GiB"})
# another error `ValueError: model.embed_tokens.weight doesn't have any device set.`
model = model.half().npu().eval() and I got above two error 😭 torch==2.1.0 |
same error, any suggestions? |
This issue will be fixed in the official torch-npu&cann release at the end of April. |
Will the issue still be fixed as scheduled at the end of April?Thanks! @statelesshz |
me too |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Hi, I worked according to the great work in this pr: #2222 , and it seems the model will always put most of weights on device 0 when device_map=auto or device_map=balance was set(I tried llama2-7b, llama2-13b, bloom7b);
additionally, it's slower than using single device, and NPU_VISIBLE_DEVICES or CUDA_VISIBLE_DEVICES can not control the numbers of visible devices;
`
import os
import time
import torch
import torch_npu
import transformers
import accelerate
from transformers import AutoModel,AutoModelForCausalLM, AutoTokenizer,LlamaForCausalLM,LlamaConfig
from transformers import (
BloomForCausalLM,
LlamaForCausalLM
)
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, Accelerator, infer_auto_device_map
os.environ['NPU_VISIBLE_DEVICES']='0,1,2,3'
os.environ['ASCEND_RT_VISIBLE_DEVICES']='0,1,2,3'
model_path="/home/wanglichang/models/models--meta-llama--Llama-2-13b-chat-hf"
model = LlamaForCausalLM.from_pretrained(model_path,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)
`
Expected behavior
according to https://huggingface.co/docs/accelerate/concept_guides/big_model_inference weights should be split evenly, and please instruct me the env variable to control the numbers of visible device, many thanks!
The text was updated successfully, but these errors were encountered: