load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61

icoicqico · 2024-04-23T13:08:06Z

Hello, it seems like using HQQModelForCausalLM.from_pretrained cant load the model using device_map or load in GPU causing the computer crash due to not enough RAM. But when I use the original AutoModelForCausalLM, I can pass device map and then it will offload layers between CPU and GPU and it won't crash. Because of this I am unable to use this library to load large model. Is there any method to solve this? Thanks.

mobicham · 2024-04-23T13:23:05Z

Hi @icoicqico , yes correct, because the current implementation loads the whole model on CPU first before quantizing.
It's in the to-do list to load from disk instead of loading from the RAM.
Which model do you want to use?

icoicqico · 2024-04-23T13:55:16Z

Thanks for the reply, I am trying to finetune Mixtral 8x7b and llama2 70b.

mobicham · 2024-04-23T14:00:35Z

Mixtral: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bitgs8-metaoffload-HQQ
70B chat: https://huggingface.co/mobiuslabsgmbh/Llama-2-70b-chat-hf-2bit_g16_s128-HQQ

icoicqico · 2024-04-25T05:22:53Z

Thanks for the quantized checkpoint, I tried to use the Mixtral-8x7B-v0.1-hf-4bit_g64-HQQ from your huggingface repo, and when I try to train it, I got an error,
line 659, in dequantize_Wq_aten
return hqq_aten.dequantize(
AttributeError: 'NoneType' object has no attribute 'dequantize

mobicham · 2024-04-25T08:57:13Z

It didn't install the CUDA backend, what kind of GPU do you have?
Otherwise, try:
HQQLinear.set_backend(HQQBackend.PYTORCH_COMPILE)

icoicqico · 2024-04-25T08:59:56Z

Thanks for the reply. I have RTX A6000 x 2, I will try to use the backend you mentioned, thanks.

mobicham · 2024-04-25T09:02:43Z

Then it's fine, it should work fine with a single A6000 !
Can you try:
import hqq_aten

icoicqico · 2024-04-25T09:16:18Z

Then it's fine, it should work fine with a single A6000 ! Can you try: import hqq_aten

Importing this causing ModuleNotFoundError, I install the package with pip install git+https://github.com/mobiusml/hqq.git.

mobicham · 2024-04-25T09:21:15Z

That confirms it, the CUDA backend is not installed. Can you try:

git clone https://github.com/mobiusml/hqq.git;
cd hqq/kernels/;
python setup_cuda.py install; 
cd ../..;

Let me know what kind of error you get

icoicqico · 2024-04-25T09:26:30Z

That confirms it, the CUDA backend is not installed. Can you try:
git clone https://github.com/mobiusml/hqq.git;
cd hqq/kernels/;
python setup_cuda.py install; 
cd ../..;
Let me know what kind of error you get

The detected CUDA version (12.0) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.
Maybe because I am using CUDA 12.0?

mobicham · 2024-04-25T09:29:29Z

Yeah, you have an older Pytorch version. Try to update to nightly build and make sure you use CUDA 12.1

mobicham · 2024-05-03T08:02:28Z

You can now use device_map with hqq + transformers: huggingface/transformers#29637

mobicham closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61

load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61

icoicqico commented Apr 23, 2024 •

edited

Loading

mobicham commented Apr 23, 2024

icoicqico commented Apr 23, 2024

mobicham commented Apr 23, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

mobicham commented May 3, 2024

load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61

load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61

Comments

icoicqico commented Apr 23, 2024 • edited Loading

mobicham commented Apr 23, 2024

icoicqico commented Apr 23, 2024

mobicham commented Apr 23, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

icoicqico commented Apr 25, 2024

mobicham commented Apr 25, 2024

mobicham commented May 3, 2024

icoicqico commented Apr 23, 2024 •

edited

Loading