-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load the model into GPU or device_map using HQQModelForCausalLM.from_pretrained? #61
Comments
Hi @icoicqico , yes correct, because the current implementation loads the whole model on CPU first before quantizing. |
Thanks for the reply, I am trying to finetune Mixtral 8x7b and llama2 70b. |
Thanks for the quantized checkpoint, I tried to use the Mixtral-8x7B-v0.1-hf-4bit_g64-HQQ from your huggingface repo, and when I try to train it, I got an error, |
It didn't install the CUDA backend, what kind of GPU do you have? |
Thanks for the reply. I have RTX A6000 x 2, I will try to use the backend you mentioned, thanks. |
Then it's fine, it should work fine with a single A6000 ! |
Importing this causing ModuleNotFoundError, I install the package with pip install git+https://github.com/mobiusml/hqq.git. |
That confirms it, the CUDA backend is not installed. Can you try:
Let me know what kind of error you get |
The detected CUDA version (12.0) mismatches the version that was used to compile |
Yeah, you have an older Pytorch version. Try to update to nightly build and make sure you use CUDA 12.1 |
You can now use |
Hello, it seems like using HQQModelForCausalLM.from_pretrained cant load the model using device_map or load in GPU causing the computer crash due to not enough RAM. But when I use the original AutoModelForCausalLM, I can pass device map and then it will offload layers between CPU and GPU and it won't crash. Because of this I am unable to use this library to load large model. Is there any method to solve this? Thanks.
The text was updated successfully, but these errors were encountered: