ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356

vmajor · 2023-06-10T02:42:00Z

Expected Behavior

This happens (so far) only with these models:
Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
WizardLM-30B-Uncensored.ggmlv3.q8_0.bin
based-30b.ggmlv3.q8_0.bin

Larger 65B models work fine. It could be something related to how these models are made, I will also reach out to @ehartford

llama-cpp-python 0.1.59 installed with OpenBLAS

CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

I was running my usual code on the CPU and restarting it to tweak the results when this error came up. I made no code changes, only to context length, I reduced it as it was exceeding the 2048 token limit.

processed_output = self.llm(
            context + "\n### Instruction: \n" + instruction + "\n### Input: \n" + input_text + output,
            max_tokens=400,
            stop=None,
            temperature=0.7,
            repeat_penalty=1.1,
            top_k=80,
            top_p=0.5,
            echo=True,
        )

Current Behavior

llama.cpp: loading model from /home/****/models/Wizard-Vicuna-30B-Uncensored-GGML/Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 35267.28 MB (+ 6248.00 MB per state)
.
llama_init_from_file: kv self size  = 6240.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Processing all summaries...
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912)
Segmentation fault

Environment and Context

wsl2
python 3.10.9

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu
AMD Ryzen 9 3900XT 12-Core Processor

Operating System, e.g. for Linux:

$ uname -a
5.15.68.1-microsoft-standard-WSL2+ #2 SMP

$ python3 3.10.9
$ make GNU Make 4.3
$ g++(Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

To me it is 100% reproducible after several inference runs with Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin

The text was updated successfully, but these errors were encountered:

ehartford · 2023-06-10T04:02:47Z

Standing by!

vmajor · 2023-06-10T04:15:42Z

It seems to be a llama.cpp issue. I found it mentioned regarding starcoder models too. I think you can carry on :)

vmajor · 2023-06-10T17:22:31Z

Update. The issue resolves on reboot so there is some memory leak in the code.

This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant.

eshaanagarwal · 2023-06-13T08:36:27Z

Update. The issue resolves on reboot so there is some memory leak in the code.

This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant.

Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this

* Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

gjmulder · 2023-06-13T18:02:57Z

Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this

Facebook did not release their llama models for commercial application.
Did you pay for a license for any of the models or llama inference code you are using?

eshaanagarwal · 2023-06-13T18:53:03Z

Hi I am using gpt-4-all gpt-j 1.3 groovy which has Apache license.

vmajor · 2023-06-15T14:22:36Z

Another update, my guanaco-65B-GGML-q6_K.bin model just failed with the same error. So it is not just 30B models that are affected.

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 1143972864, available 1073741824)
Segmentation fault

ibidani · 2023-07-30T02:07:41Z

In hope to help isolate the bug I tried to reproduce the issue since version 0.1.55.
The first release that I experience the issue is 0.1.76(0.1.75 wasn't tested, isn't available on pypi) and didn't see it on 0.1.74
Could it be related to this change?
v0.1.74...v0.1.76#diff-9184e090a770a03ec97535fbef520d03252b635dafbed7fa99e59a5cca569fbcR200

Environment

python -V
Python 3.10.12
uname -a
Linux Idan-PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Model: nous-hermes-13b.ggmlv3.q4_0.bin

vmajor mentioned this issue Jun 10, 2023

starcoder -- not enough space in the context's memory pool ggml-org/ggml#158

Closed

gjmulder added the llama.cpp Problem with llama.cpp shared lib label Jun 10, 2023

valkryhx mentioned this issue Oct 8, 2023

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool li-plus/chatglm.cpp#136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356

vmajor commented Jun 10, 2023 •

edited

Loading

ehartford commented Jun 10, 2023

vmajor commented Jun 10, 2023

vmajor commented Jun 10, 2023

eshaanagarwal commented Jun 13, 2023

gjmulder commented Jun 13, 2023

eshaanagarwal commented Jun 13, 2023

vmajor commented Jun 15, 2023

ibidani commented Jul 30, 2023

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 546644800, available 536870912) Segmentation fault #356

Comments

vmajor commented Jun 10, 2023 • edited Loading

Expected Behavior

Current Behavior

Environment and Context

ehartford commented Jun 10, 2023

vmajor commented Jun 10, 2023

vmajor commented Jun 10, 2023

eshaanagarwal commented Jun 13, 2023

gjmulder commented Jun 13, 2023

eshaanagarwal commented Jun 13, 2023

vmajor commented Jun 15, 2023

ibidani commented Jul 30, 2023

vmajor commented Jun 10, 2023 •

edited

Loading