Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

Closed
4 tasks
lichangW opened this issue Jan 22, 2024 · 17 comments
Closed
4 tasks

Comments

@lichangW
Copy link

System Info

ascend 910B, CANN=7.0, torch=2.1, torch_npu=2.1, accelerate=0.26.1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

Hi, I worked according to the great work in this pr: #2222 , and it seems the model will always put most of weights on device 0 when device_map=auto or device_map=balance was set(I tried llama2-7b, llama2-13b, bloom7b);
additionally, it's slower than using single device, and NPU_VISIBLE_DEVICES or CUDA_VISIBLE_DEVICES can not control the numbers of visible devices;

`
import os
import time

import torch
import torch_npu
import transformers
import accelerate
from transformers import AutoModel,AutoModelForCausalLM, AutoTokenizer,LlamaForCausalLM,LlamaConfig
from transformers import (
BloomForCausalLM,
LlamaForCausalLM
)
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, Accelerator, infer_auto_device_map

os.environ['NPU_VISIBLE_DEVICES']='0,1,2,3'
os.environ['ASCEND_RT_VISIBLE_DEVICES']='0,1,2,3'
model_path="/home/wanglichang/models/models--meta-llama--Llama-2-13b-chat-hf"
model = LlamaForCausalLM.from_pretrained(model_path,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)
`

image
image

Expected behavior

according to https://huggingface.co/docs/accelerate/concept_guides/big_model_inference weights should be split evenly, and please instruct me the env variable to control the numbers of visible device, many thanks!

@muellerzr
Copy link
Collaborator

You should set the NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before any import to torch/accelerate/torch_npu

@lichangW
Copy link
Author

You should set the NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before any import to torch/accelerate/torch_npu

Thanks for reply! set NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before import worked ! and another key point as we can see is it doesn't split weights evenly over all available device, is there any ideas about this?

@ji-huazhong
Copy link
Contributor

@lichangW Something seems to be wrong with torch_npu as it is not properly sending tensors to different devices

@ji-huazhong
Copy link
Contributor

cc @fakeYan

@lichangW
Copy link
Author

lichangW commented Jan 25, 2024

@lichangW Something seems to be wrong with torch_npu as it is not properly sending tensors to different devices

Thanks for reply! I compiled the torch_npu from branch v2.1.0 according to #2222, and need more suggestion to make it works properly.

@lichangW
Copy link
Author

lichangW commented Jan 30, 2024

hi, is there any update on this issue ^_^ @fakeYan @statelesshz

@Wayfarer123
Copy link

Would be great to know if there's any progress on solving this issue... @statelesshz

@junior-zsy
Copy link

@lichangW 这个问题你解决了吗,谢谢

@ji-huazhong
Copy link
Contributor

@junior-zsy @lichangW This issue will be fixed in the next version of torch_npu (at the latest in April)

@ZhuoranLyu
Copy link

华为这个好像是不支持单进程多卡,改起来很烦,现在只能单张卡推理

@lichangW
Copy link
Author

lichangW commented Mar 5, 2024

@junior-zsy @lichangW This issue will be fixed in the next version of torch_npu (at the latest in April)

Thanks for reply, I just tried the latest torch_npu version v6.0.rc1.alpha001-pytorch2.2.0 and the problem still exits.

@xiabo0816
Copy link

I'm working on inference 01ai/Yi-34b-Chat with ascend 910b3. Because of loading at least need 2-gpu, my infer code:

import os
os.environ['NPU_VISIBLE_DEVICES']='0,1,2,3'
os.environ['ASCEND_RT_VISIBLE_DEVICES']='0,1,2,3'
import torch
import torch_npu
import accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, Accelerator, infer_auto_device_map
torch.npu.set_device(['npu:0','npu:1'])
model_path = "/path/to/01ai/Yi-34b-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
# error `AttributeError: module 'torch_npu.npu' has no attribute 'mem_get_info'`
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map={"npu:0":"60GiB","npu:1":"60GiB"})
# another error `ValueError: model.embed_tokens.weight doesn't have any device set.`
model = model.half().npu().eval()

and I got above two error 😭

torch==2.1.0
torch-npu==2.1.0rc1

@Sander-houqi
Copy link

same error, any suggestions?

@ji-huazhong
Copy link
Contributor

This issue will be fixed in the official torch-npu&cann release at the end of April.
cc @Sander-houqi @xiabo0816 @junior-zsy @ZhuoranLyu @lichangW

@fuzhenxin
Copy link

fuzhenxin commented Apr 29, 2024

This issue will be fixed in the official torch-npu&cann release at the end of April. cc @Sander-houqi @xiabo0816 @junior-zsy @ZhuoranLyu @lichangW

Will the issue still be fixed as scheduled at the end of April?Thanks! @statelesshz

@sunnyregion
Copy link

This issue will be fixed in the official torch-npu&cann release at the end of April. cc @Sander-houqi @xiabo0816 @junior-zsy @ZhuoranLyu @lichangW

Will the issue still be fixed as scheduled at the end of April?Thanks! @statelesshz

me too

Copy link

github-actions bot commented Jun 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants