Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

lichangW · 2024-01-22T13:07:40Z

System Info

ascend 910B, CANN=7.0, torch=2.1, torch_npu=2.1, accelerate=0.26.1

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Hi, I worked according to the great work in this pr: #2222 , and it seems the model will always put most of weights on device 0 when device_map=auto or device_map=balance was set(I tried llama2-7b, llama2-13b, bloom7b);
additionally, it's slower than using single device, and NPU_VISIBLE_DEVICES or CUDA_VISIBLE_DEVICES can not control the numbers of visible devices;

`
import os
import time

import torch
import torch_npu
import transformers
import accelerate
from transformers import AutoModel,AutoModelForCausalLM, AutoTokenizer,LlamaForCausalLM,LlamaConfig
from transformers import (
BloomForCausalLM,
LlamaForCausalLM
)
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, Accelerator, infer_auto_device_map

os.environ['NPU_VISIBLE_DEVICES']='0,1,2,3'
os.environ['ASCEND_RT_VISIBLE_DEVICES']='0,1,2,3'
model_path="/home/wanglichang/models/models--meta-llama--Llama-2-13b-chat-hf"
model = LlamaForCausalLM.from_pretrained(model_path,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)
`

Expected behavior

according to https://huggingface.co/docs/accelerate/concept_guides/big_model_inference weights should be split evenly, and please instruct me the env variable to control the numbers of visible device, many thanks!

The text was updated successfully, but these errors were encountered:

muellerzr · 2024-01-23T09:09:03Z

You should set the NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before any import to torch/accelerate/torch_npu

lichangW · 2024-01-24T02:39:54Z

You should set the NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before any import to torch/accelerate/torch_npu

Thanks for reply! set NPU_VISIBLE_DEVICES and ASCEND_RT_VISIBLE_DEVICES before import worked ! and another key point as we can see is it doesn't split weights evenly over all available device, is there any ideas about this?

ji-huazhong · 2024-01-24T13:03:43Z

@lichangW Something seems to be wrong with torch_npu as it is not properly sending tensors to different devices

ji-huazhong · 2024-01-24T13:04:06Z

cc @fakeYan

lichangW · 2024-01-25T06:55:03Z

@lichangW Something seems to be wrong with torch_npu as it is not properly sending tensors to different devices

Thanks for reply! I compiled the torch_npu from branch v2.1.0 according to #2222, and need more suggestion to make it works properly.

lichangW · 2024-01-30T02:23:11Z

hi, is there any update on this issue ^_^ @fakeYan @statelesshz

Wayfarer123 · 2024-02-22T15:20:19Z

Would be great to know if there's any progress on solving this issue... @statelesshz

junior-zsy · 2024-02-23T10:57:30Z

@lichangW 这个问题你解决了吗，谢谢

ji-huazhong · 2024-02-23T14:01:14Z

@junior-zsy @lichangW This issue will be fixed in the next version of torch_npu (at the latest in April)

ZhuoranLyu · 2024-02-29T09:31:20Z

华为这个好像是不支持单进程多卡，改起来很烦，现在只能单张卡推理

lichangW · 2024-03-05T08:51:30Z

@junior-zsy @lichangW This issue will be fixed in the next version of torch_npu (at the latest in April)

Thanks for reply, I just tried the latest torch_npu version v6.0.rc1.alpha001-pytorch2.2.0 and the problem still exits.

xiabo0816 · 2024-03-26T07:55:55Z

I'm working on inference 01ai/Yi-34b-Chat with ascend 910b3. Because of loading at least need 2-gpu, my infer code:

import os
os.environ['NPU_VISIBLE_DEVICES']='0,1,2,3'
os.environ['ASCEND_RT_VISIBLE_DEVICES']='0,1,2,3'
import torch
import torch_npu
import accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, Accelerator, infer_auto_device_map
torch.npu.set_device(['npu:0','npu:1'])
model_path = "/path/to/01ai/Yi-34b-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
# error `AttributeError: module 'torch_npu.npu' has no attribute 'mem_get_info'`
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map={"npu:0":"60GiB","npu:1":"60GiB"})
# another error `ValueError: model.embed_tokens.weight doesn't have any device set.`
model = model.half().npu().eval()

and I got above two error 😭

torch==2.1.0
torch-npu==2.1.0rc1

Sander-houqi · 2024-04-12T07:06:39Z

same error, any suggestions?

ji-huazhong · 2024-04-12T09:51:40Z

This issue will be fixed in the official torch-npu&cann release at the end of April.
cc @Sander-houqi @xiabo0816 @junior-zsy @ZhuoranLyu @lichangW

fuzhenxin · 2024-04-29T01:36:35Z

This issue will be fixed in the official torch-npu&cann release at the end of April. cc @Sander-houqi @xiabo0816 @junior-zsy @ZhuoranLyu @lichangW

Will the issue still be fixed as scheduled at the end of April？Thanks! @statelesshz

sunnyregion · 2024-05-08T07:41:11Z

This issue will be fixed in the official torch-npu&cann release at the end of April. cc @Sander-houqi @xiabo0816 @junior-zsy @ZhuoranLyu @lichangW

Will the issue still be fixed as scheduled at the end of April？Thanks! @statelesshz

me too

github-actions · 2024-06-01T15:06:17Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

eigen2017 mentioned this issue Apr 5, 2024

add visible config env for ascend npu deepspeedai/DeepSpeed#5369

Closed

wangshuai09 mentioned this issue Apr 16, 2024

[BUG] RuntimeError: NPU out of memory. Tried to allocate 268.00 MiB lm-sys/FastChat#3237

Open

sunnyregion mentioned this issue May 8, 2024

Huawei NPU device_map=auto doesn't split model evenly over all devices Ascend/pytorch#31

Open

ji-huazhong mentioned this issue May 24, 2024

Improve transformers-cli env reporting huggingface/transformers#31003

Merged

5 tasks

github-actions bot closed this as completed Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

lichangW commented Jan 22, 2024

muellerzr commented Jan 23, 2024

lichangW commented Jan 24, 2024

ji-huazhong commented Jan 24, 2024

ji-huazhong commented Jan 24, 2024

lichangW commented Jan 25, 2024 •

edited

Loading

lichangW commented Jan 30, 2024 •

edited

Loading

Wayfarer123 commented Feb 22, 2024

junior-zsy commented Feb 23, 2024

ji-huazhong commented Feb 23, 2024

ZhuoranLyu commented Feb 29, 2024

lichangW commented Mar 5, 2024

xiabo0816 commented Mar 26, 2024

Sander-houqi commented Apr 12, 2024

ji-huazhong commented Apr 12, 2024

fuzhenxin commented Apr 29, 2024 •

edited

Loading

sunnyregion commented May 8, 2024

github-actions bot commented Jun 1, 2024

Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

Comments

lichangW commented Jan 22, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

muellerzr commented Jan 23, 2024

lichangW commented Jan 24, 2024

ji-huazhong commented Jan 24, 2024

ji-huazhong commented Jan 24, 2024

lichangW commented Jan 25, 2024 • edited Loading

lichangW commented Jan 30, 2024 • edited Loading

Wayfarer123 commented Feb 22, 2024

junior-zsy commented Feb 23, 2024

ji-huazhong commented Feb 23, 2024

ZhuoranLyu commented Feb 29, 2024

lichangW commented Mar 5, 2024

xiabo0816 commented Mar 26, 2024

Sander-houqi commented Apr 12, 2024

ji-huazhong commented Apr 12, 2024

fuzhenxin commented Apr 29, 2024 • edited Loading

sunnyregion commented May 8, 2024

github-actions bot commented Jun 1, 2024

lichangW commented Jan 25, 2024 •

edited

Loading

lichangW commented Jan 30, 2024 •

edited

Loading

fuzhenxin commented Apr 29, 2024 •

edited

Loading