RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #510

xieziyi881 · 2024-06-19T08:10:34Z

`from awq import AutoAWQForCausalLM
from awq.utils.utils import get_best_device
from transformers import AutoTokenizer, TextStreamer

quant_path = "/workspace/awq_model"

if get_best_device() == "cpu":
model = AutoAWQForCausalLM.from_quantized(quant_path, use_qbits=True, fuse_layers=False)
else:
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True,device_map="balanced")
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
#初始化流式输出器
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "You're standing on the surface of the Earth. "
"You walk one mile south, one mile west and one mile north. "
"You end up exactly where you started. Where are you?"

chat = [
{"role": "system", "content": "You are a concise assistant that helps answer questions."},
{"role": "user", "content": prompt},
]

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

tokens = tokenizer.apply_chat_template(
chat,
return_tensors="pt"
)
tokens = tokens.to("cuda:0")

generation_output = model.generate(
tokens,
streamer=streamer,
max_new_tokens=64,
eos_token_id=terminators
)
`

Here's my script for the quantized model，However, I have the following error, how can I fix it?

ryan0980 · 2024-07-25T08:58:49Z

you can try
import os os.environ['CUDA_VISIBLE_DEVICES'] = '6' device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') model.to(device)

works for me

ArlanCooper · 2024-12-03T13:00:11Z

i have to load qwen2.5-72b model with two gpus,how to quant it ?

casper-hansen mentioned this issue Aug 5, 2024

Recent changes is causing "found at least two devices" huggingface/transformers#32420

Closed

4 tasks

davedgd mentioned this issue Oct 4, 2024

fix for "two devices" issue due to RoPE changes #630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #510

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #510

xieziyi881 commented Jun 19, 2024 •

edited

Loading

ryan0980 commented Jul 25, 2024

ArlanCooper commented Dec 3, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #510

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #510

Comments

xieziyi881 commented Jun 19, 2024 • edited Loading

ryan0980 commented Jul 25, 2024

ArlanCooper commented Dec 3, 2024

xieziyi881 commented Jun 19, 2024 •

edited

Loading