-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356
Comments
Standing by! |
It seems to be a llama.cpp issue. I found it mentioned regarding starcoder models too. I think you can carry on :) |
Update. The issue resolves on reboot so there is some memory leak in the code. This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant. |
Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this |
* Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
Hi I am using gpt-4-all gpt-j 1.3 groovy which has Apache license. |
Another update, my
|
In hope to help isolate the bug I tried to reproduce the issue since version 0.1.55. Environment python -V
Python 3.10.12
uname -a
Linux Idan-PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Model: |
Expected Behavior
This happens (so far) only with these models:
Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
WizardLM-30B-Uncensored.ggmlv3.q8_0.bin
based-30b.ggmlv3.q8_0.bin
Larger 65B models work fine. It could be something related to how these models are made, I will also reach out to @ehartford
llama-cpp-python 0.1.59 installed with OpenBLAS
CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
I was running my usual code on the CPU and restarting it to tweak the results when this error came up. I made no code changes, only to context length, I reduced it as it was exceeding the 2048 token limit.
Current Behavior
Environment and Context
wsl2
python 3.10.9
$ lscpu
AMD Ryzen 9 3900XT 12-Core Processor
$ uname -a
5.15.68.1-microsoft-standard-WSL2+ #2 SMP
To me it is 100% reproducible after several inference runs with Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
The text was updated successfully, but these errors were encountered: