multi-gpu fix #668

casper-hansen · 2024-12-03T19:23:49Z

Closes #667 #666 #662 #652 #638 (if not, please let me know in a new issue with traceback + code).

davedgd · 2024-12-04T06:50:03Z

Not sure if you’ve already fully tested this, but tomorrow I will run some tests to see how things look similar to the old PR and will let you know. Really appreciate your work on this!

davedgd · 2024-12-04T19:05:00Z

Just confirmed this works perfectly using the recommended approach, i.e.,

model = AutoAWQForCausalLM.from_pretrained(model_path)

Tested with both one and two GPUs (with the former being slightly faster likely due to less transfer).

Jerryzhao-z · 2024-12-06T13:01:39Z

I reproduced two device error while quantizing qwen2 7b lora with the latest version.

deps:

transformers==4.47.0
accelerate==1.1.1

I set two cuda devices available and track this error. The AlignDevicesHook.execution_device attached on model.rotary_emb is still equal to cuda:1 after awq.models.qwen2.move_embed set model.rotary_emb to cuda:0.
so this hook resets activations to device cuda:1 while forwarding rotary_embed module

after removing this [move_embed] code, quantization task got executed.

hope it helps

Jerryzhao-z · 2024-12-07T11:43:49Z

I reproduced two device error while quantizing qwen2 7b lora with the latest version.

deps:

transformers==4.47.0

accelerate==1.1.1

I set two cuda devices available and track this error. The AlignDevicesHook.execution_device attached on model.rotary_emb is still equal to cuda:1 after awq.models.qwen2.move_embed set model.rotary_emb to cuda:0. so this hook resets activations to device cuda:1 while forwarding rotary_embed module

after removing this [move_embed] code, quantization task got executed.

hope it helps

sorry, I didn't follow the recommended usage examples. It works perfectly now.
model = AutoAWQForCausalLM.from_pretrained(model_path)

initial multi-gpu fix

9f197be

casper-hansen mentioned this pull request Dec 3, 2024

fix for "two devices" issue due to RoPE changes #630

Merged

casper-hansen added 5 commits December 3, 2024 19:30

minimum transformers 4.45.0

7d9b539

added support for all models

d9a3c39

standardize device_map=None

18c5208

standardize low_cpu_mem_usage and use_cache

dc77595

Fix multi-GPU generation

621b24a

casper-hansen mentioned this pull request Dec 3, 2024

Model quantize error #598

Open

updated runpod script

8b53a84

casper-hansen merged commit f2171f3 into main Dec 3, 2024

This was referenced Dec 3, 2024

Failed to convert qwen1.5-32b model！ #666

Closed

Bugs in AWQ models deployed in multiple GPUs. #662

Closed

Cannot copy out of meta tensor; no data! when half process #652

Closed

Example for quantize in multiple GPU's #638

Closed

casper-hansen deleted the fix-multi-gpu branch December 30, 2024 21:23

haitham-boxmind mentioned this pull request Jan 7, 2025

Multi-GPU/CPU offloading is still not working as intended #689

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-gpu fix #668

multi-gpu fix #668

casper-hansen commented Dec 3, 2024 •

edited

Loading

davedgd commented Dec 4, 2024 •

edited

Loading

davedgd commented Dec 4, 2024 •

edited

Loading

Jerryzhao-z commented Dec 6, 2024 •

edited

Loading

Jerryzhao-z commented Dec 7, 2024

multi-gpu fix #668

multi-gpu fix #668

Conversation

casper-hansen commented Dec 3, 2024 • edited Loading

davedgd commented Dec 4, 2024 • edited Loading

davedgd commented Dec 4, 2024 • edited Loading

Jerryzhao-z commented Dec 6, 2024 • edited Loading

Jerryzhao-z commented Dec 7, 2024

casper-hansen commented Dec 3, 2024 •

edited

Loading

davedgd commented Dec 4, 2024 •

edited

Loading

davedgd commented Dec 4, 2024 •

edited

Loading

Jerryzhao-z commented Dec 6, 2024 •

edited

Loading