Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_context_length > 2048 (比如langchain 场景下很长的上下文)时 报错: ggml_new_tensor_impl: not enough space in the scratch memory pool #136

Closed
valkryhx opened this issue Oct 8, 2023 · 5 comments

Comments

@valkryhx
Copy link

valkryhx commented Oct 8, 2023

首先大赞本项目的推理加速效果!666!

环境Linux py38
我在使用python 绑定编译后的chatglm.cpp 模块后,使用q4_0量化chatglm2-6b
推理设置
generation_kwargs = dict(
max_length=6000,
max_context_length=2400,
do_sample=args.temp > 0,
top_k=args.top_k,
top_p=args.top_p,
temperature=args.temp,
repetition_penalty=args.repeat_penalty,
stream=True,
)
设置max_context_length > 2048 (比如langchain 场景下很长的上下文)时 报错:
ggml_new_tensor_impl: not enough space in the scratch memory pool
这个问题貌似不少llama.cpp用户遇到过 google一搜遍地开花

我在google搜
llama-cpp-python 是有类似的issue
abetlen/llama-cpp-python#356
abetlen/llama-cpp-python#356 (comment) 提到是llama.cpp内存泄漏

llama.cpp 项目里面也有这个bug 有人 回滚了版本貌似解决了 llama_cpp_python回滚到0.1.74
ggerganov/llama.cpp#29 (comment)
ggerganov/llama.cpp#2404 (comment)

使用本项目跑推理时遇到这个bug怎么解决呢?谢谢

@trekrollercoaster
Copy link

same issues: #131

@ISNing
Copy link

ISNing commented Dec 13, 2023

遇到了相同的问题:

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 1357824000, available 1342177280)

而且对于我这里,并不是2048这个确切的数,而是还要大一些
我的模型是chatglm3-6b-32k q5_1 量化
话说这里的1342177280指的是什么?CUDA的显存还是内存?

@Pan06da
Copy link

Pan06da commented Jan 27, 2024

同样遇到这个问题 大家有解决的么?

@VaalaCat
Copy link

VaalaCat commented Feb 7, 2024

已解决:需要修改空间分配的这几个值,同时保证max_context_length 和 max_tokens 不会占用超过内存设置的值,否则程序会崩溃

I've figure it out, you need to change those memory Settings in chatglm.h. Also make sure "max_context_length" and "max_tokens" do not occupy more than than memory value

https://github.com/li-plus/chatglm.cpp/blob/main/chatglm.h#L1019-L1020

@li-plus
Copy link
Owner

li-plus commented Jun 21, 2024

#305 修复了,最新版本 (v0.4.0) 会按需进行内存分配,不再需要预设 scratch size / memory size,只要设备内存足够就可以推理长文。

@li-plus li-plus closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants