max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

valkryhx · 2023-10-08T17:19:52Z

首先大赞本项目的推理加速效果！666！

环境Linux py38
我在使用python 绑定编译后的chatglm.cpp 模块后，使用q4_0量化chatglm2-6b
推理设置
generation_kwargs = dict(
max_length=6000,
max_context_length=2400,
do_sample=args.temp > 0,
top_k=args.top_k,
top_p=args.top_p,
temperature=args.temp,
repetition_penalty=args.repeat_penalty,
stream=True,
)
设置max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错:
ggml_new_tensor_impl: not enough space in the scratch memory pool
这个问题貌似不少llama.cpp用户遇到过 google一搜遍地开花

我在google搜
llama-cpp-python 是有类似的issue
abetlen/llama-cpp-python#356
abetlen/llama-cpp-python#356 (comment) 提到是llama.cpp内存泄漏

llama.cpp 项目里面也有这个bug 有人回滚了版本貌似解决了 llama_cpp_python回滚到0.1.74
ggerganov/llama.cpp#29 (comment)
ggerganov/llama.cpp#2404 (comment)

使用本项目跑推理时遇到这个bug怎么解决呢？谢谢

trekrollercoaster · 2023-10-23T03:09:25Z

same issues: #131

ISNing · 2023-12-13T15:19:53Z

遇到了相同的问题：

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 1357824000, available 1342177280)

而且对于我这里，并不是2048这个确切的数，而是还要大一些
我的模型是chatglm3-6b-32k q5_1 量化
话说这里的1342177280指的是什么？CUDA的显存还是内存？

Pan06da · 2024-01-27T09:47:16Z

同样遇到这个问题大家有解决的么？

VaalaCat · 2024-02-07T15:47:22Z

已解决：需要修改空间分配的这几个值，同时保证max_context_length 和 max_tokens 不会占用超过内存设置的值，否则程序会崩溃

I've figure it out, you need to change those memory Settings in chatglm.h. Also make sure "max_context_length" and "max_tokens" do not occupy more than than memory value

https://github.com/li-plus/chatglm.cpp/blob/main/chatglm.h#L1019-L1020

li-plus · 2024-06-21T03:13:51Z

在 #305 修复了，最新版本 (v0.4.0) 会按需进行内存分配，不再需要预设 scratch size / memory size，只要设备内存足够就可以推理长文。

VaalaCat mentioned this issue Feb 7, 2024

如何设置更长的上下文(context) #237

Open

li-plus closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

valkryhx commented Oct 8, 2023 •

edited

Loading

trekrollercoaster commented Oct 23, 2023

ISNing commented Dec 13, 2023

Pan06da commented Jan 27, 2024

VaalaCat commented Feb 7, 2024

li-plus commented Jun 21, 2024

max_context_length > 2048 (比如langchain 场景下很长的上下文)时 报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

max_context_length > 2048 (比如langchain 场景下很长的上下文)时 报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

Comments

valkryhx commented Oct 8, 2023 • edited Loading

trekrollercoaster commented Oct 23, 2023

ISNing commented Dec 13, 2023

Pan06da commented Jan 27, 2024

VaalaCat commented Feb 7, 2024

li-plus commented Jun 21, 2024

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

valkryhx commented Oct 8, 2023 •

edited

Loading