-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot load Bloom-7b1 ggml model in GPU #3697
Comments
I am not able to load llava model ggml-model-q4_k.gguf, mmproj mmproj-model-f16.gguf, into GPU too. (main) llama v2 is working fine. clip_model_load: text_encoder: 0 prompt: 'describe the image'` |
is there any solution for this? I found that models with alibi all seem to have this issue on nvidia gpu. it can run successfully on metal. |
As a temporary workaround, you can add |
Fixed in #3921 |
I used the
convert-bloom-hf-to-gguf.py
file to convert the Huggingfacebigscience/bloom-7b1
to a ggml model withf16
successfully:This gives me a model
ggml-model-f16.gguf
that correctly loads and run in CPU. However, when I try to offload a layer on the GPU, I get the following error:Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Linux nemo 5.4.0-165-generic #182-Ubuntu SMP Mon Oct 2 19:43:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
convert-bloom-hf-to-gguf.py
to convert to f16 ggml.Failure Logs
The text was updated successfully, but these errors were encountered: