You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A recent paper by Meta/MIT/CMU proposed StreamingLLM, a simple yet efficient solution to enable "infinite" context. Better yet, the implementation in llama.cpp is as trivial as changing the n_keep value with option --keep as discussed in this issue. Unfortunately, the high-level API of llama-cpp-python does not support the keep/n_keep parameter.
It should be simple to add the parameter to the high-level API, ideally in the constructor for class Llama and to pass it along to function llama_cpp.llama_load_model_from_file as part of parameter lparamshere.
The text was updated successfully, but these errors were encountered:
A recent paper by Meta/MIT/CMU proposed StreamingLLM, a simple yet efficient solution to enable "infinite" context. Better yet, the implementation in llama.cpp is as trivial as changing the
n_keep
value with option--keep
as discussed in this issue. Unfortunately, the high-level API of llama-cpp-python does not support thekeep
/n_keep
parameter.It should be simple to add the parameter to the high-level API, ideally in the constructor for class
Llama
and to pass it along to functionllama_cpp.llama_load_model_from_file
as part of parameterlparams
here.The text was updated successfully, but these errors were encountered: