-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent changes is causing "found at least two devices" #32420
Comments
Thanks for reporting, could you try with the latest release? |
I tested this today, and it's still an issue with the latest transformers release (v4.44.2) at time of writing. |
😢 I don't know if this is accelerate or not so pinging @muellerzr as well |
Looks like AWQ is another model that can't be fast-loaded. Will put in a fix |
Potentially. I'm not too familiar with the AWQ codebase. The PR that likely broke this is here: #31771 In the model definition we need to set |
@muellerzr For reference, this error occurs when we want to run inference to quantize the model. I have not received similar reports for inference of quantized models. So in other words, this is happening when running models with BF16/FP16. EDIT: Model loading is quite standard use of the auto class in transformers. @classmethod
def from_pretrained(
self,
model_path: Annotated[str, Doc("A Huggingface path or local path to a model.")],
model_type: Annotated[str, Doc("The model type, loaded from config.json.")],
torch_dtype: Annotated[
torch.dtype,
Doc(
"The dtype to load the model as. May not work with other values than float16."
),
] = torch.float16,
trust_remote_code: Annotated[
bool,
Doc(
"Useful for Huggingface repositories that have not been integrated into transformers yet."
),
] = True,
safetensors: Annotated[
bool, Doc("Whether to download/load safetensors instead of torch weights.")
] = True,
device_map: Annotated[
Union[str, Dict],
Doc(
"A device map that will be passed onto the model loading method from transformers."
),
] = None,
download_kwargs: Annotated[
Dict,
Doc("Used for configure download model"),
] = None,
**model_init_kwargs: Annotated[
Dict,
Doc(
"Additional kwargs that are passed to the model during initialization."
),
],
):
"""A method for initialization of pretrained models, usually in FP16."""
# Get weights path and quant config
model_weights_path, config, quant_config = self._load_config(
self,
model_path,
"",
safetensors,
trust_remote_code=trust_remote_code,
download_kwargs=download_kwargs,
)
target_cls_name = TRANSFORMERS_AUTO_MAPPING_DICT[config.model_type]
target_cls = getattr(transformers, target_cls_name)
processor = None
if target_cls_name == "AutoModelForVision2Seq":
processor = AutoProcessor.from_pretrained(model_weights_path)
processor: CLIPImageProcessor = processor.image_processor
# If not quantized, must load with AutoModelForCausalLM
model = target_cls.from_pretrained(
model_weights_path,
trust_remote_code=trust_remote_code,
torch_dtype=torch_dtype,
use_safetensors=safetensors,
device_map=device_map,
**model_init_kwargs,
)
model.eval()
return self(
model,
model_type,
is_quantized=False,
config=config,
quant_config=quant_config,
processor=processor,
) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Has there been any fix for this? This is also affecting autogptq: #729 |
Fixes huggingface#32420 by placing both inv_freq_expanded and position_ids_expanded on the same device. This avoids the following error on this line: freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2) Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) Allows autoawq and other packages to correctly perform CPU offloading during quantization.
I've just added a pull request for a patch that I believe resolves this issue. Feel free to try installing this patch to confirm the fix: https://github.com/davedgd/transformers/tree/patch-1 @muellerzr: Please note I did try to set |
I am also encountering this issue when using dynamic rope scaling and here is what's happening:
This can be fixed with a simple change like this: trevor-m@1a7e62a |
Thanks for the nice report !It seems that there is indeed a device mismatch here. However, one point I don't get is why the |
Let me try to make a small reproducer. I wonder if using |
Please ignore my earlier comment (I deleted it to avoid confusion) -- it turns out there are multiple issues with AutoAWQ that are complicating my testing of relevant fixes, and I need to more thoroughly evaluate what's going on to figure out the best solution(s). |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
+1, qwen2.5-72b-instruct |
has solve it? |
Is it the same as #35505? 🤗 |
System Info
transformers 4.43.3, python 3.10, linux
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I have received multiple reports that model loading behaviour recently changed which is causing a device error. This can usually be fixed by specifying the
device_map
, but prior to recent changes (I don't know when this happened), the model was loaded and could inference without any issues on multiple GPUs.Referenced issues:
casper-hansen/AutoAWQ#510
casper-hansen/AutoAWQ#558
casper-hansen/AutoAWQ#571
Expected behavior
The expected behavior is that we do not see these errors with the default settings of
device_map=None
. I am generally not sure what exactly changed, so it is hard to be more preciseThe text was updated successfully, but these errors were encountered: