Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356

Open
vmajor opened this issue Jun 10, 2023 · 8 comments
Labels
llama.cpp Problem with llama.cpp shared lib

Comments

@vmajor
Copy link

vmajor commented Jun 10, 2023

Expected Behavior

This happens (so far) only with these models:
Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
WizardLM-30B-Uncensored.ggmlv3.q8_0.bin
based-30b.ggmlv3.q8_0.bin

Larger 65B models work fine. It could be something related to how these models are made, I will also reach out to @ehartford

llama-cpp-python 0.1.59 installed with OpenBLAS

CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

I was running my usual code on the CPU and restarting it to tweak the results when this error came up. I made no code changes, only to context length, I reduced it as it was exceeding the 2048 token limit.

processed_output = self.llm(
            context + "\n### Instruction: \n" + instruction + "\n### Input: \n" + input_text + output,
            max_tokens=400,
            stop=None,
            temperature=0.7,
            repeat_penalty=1.1,
            top_k=80,
            top_p=0.5,
            echo=True,
        )

Current Behavior

llama.cpp: loading model from /home/****/models/Wizard-Vicuna-30B-Uncensored-GGML/Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 35267.28 MB (+ 6248.00 MB per state)
.
llama_init_from_file: kv self size  = 6240.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Processing all summaries...
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912)
Segmentation fault

Environment and Context

wsl2
python 3.10.9

  • Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu
AMD Ryzen 9 3900XT 12-Core Processor

  • Operating System, e.g. for Linux:

$ uname -a
5.15.68.1-microsoft-standard-WSL2+ #2 SMP

$ python3 3.10.9
$ make GNU Make 4.3
$ g++(Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

To me it is 100% reproducible after several inference runs with Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin

@ehartford
Copy link

Standing by!

@vmajor
Copy link
Author

vmajor commented Jun 10, 2023

It seems to be a llama.cpp issue. I found it mentioned regarding starcoder models too. I think you can carry on :)

@vmajor
Copy link
Author

vmajor commented Jun 10, 2023

Update. The issue resolves on reboot so there is some memory leak in the code.

This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant.

@gjmulder gjmulder added the llama.cpp Problem with llama.cpp shared lib label Jun 10, 2023
@eshaanagarwal
Copy link

Update. The issue resolves on reboot so there is some memory leak in the code.

This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant.

Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this

xaptronic pushed a commit to xaptronic/llama-cpp-python that referenced this issue Jun 13, 2023
* Introduce structs for the q4 data blocks

* ggml : rename quant struct variables + fix ARM_NEON

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@gjmulder
Copy link
Contributor

Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this

  1. Facebook did not release their llama models for commercial application.
  2. Did you pay for a license for any of the models or llama inference code you are using?

@eshaanagarwal
Copy link

Hi I am using gpt-4-all gpt-j 1.3 groovy which has Apache license.

@vmajor
Copy link
Author

vmajor commented Jun 15, 2023

Another update, my guanaco-65B-GGML-q6_K.bin model just failed with the same error. So it is not just 30B models that are affected.

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 1143972864, available 1073741824)
Segmentation fault

@ibidani
Copy link

ibidani commented Jul 30, 2023

In hope to help isolate the bug I tried to reproduce the issue since version 0.1.55.
The first release that I experience the issue is 0.1.76(0.1.75 wasn't tested, isn't available on pypi) and didn't see it on 0.1.74
Could it be related to this change?
v0.1.74...v0.1.76#diff-9184e090a770a03ec97535fbef520d03252b635dafbed7fa99e59a5cca569fbcR200

Environment

python -V
Python 3.10.12
uname -a
Linux Idan-PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Model: nous-hermes-13b.ggmlv3.q4_0.bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llama.cpp Problem with llama.cpp shared lib
Projects
None yet
Development

No branches or pull requests

5 participants