forked from openvinotoolkit/openvino.genai
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Control KV-cache size for StaticLLMPipeline (openvinotoolkit#795)
# Overview Introduce _MAX_PROMPT_LEN_ and _MIN_RESPONSE_LEN_; * _MAX_PROMPT_LEN_ - The maximum number of tokens that StaticLLMPipeline may process for input prompt * _MIN_RESPONSE_LEN_ - The minimum number of tokens that will be returned as the result of generation ``` ov::AnyMap pipeline_config; pipeline_config["MAX_PROMPT_LEN"] = 1024u; pipeline_config["MIN_RESPONSE_LEN"] = 100u; ov::LLMPipeline pipe(model_path, "NPU", pipeline_config); ``` The KV-cache sizes for models will be calculated the following way: - Prefill KV-cache: _MAX_PROMPT_LEN_ - Generate KV-cache: _MAX_PROMPT_LEN_ + _MIN_RESPONSE_LEN_ By default _MAX_PROMPT_LEN_ and _MIN_RESPONSE_LEN_ are assigned to 1024 and 150 respectively.
- Loading branch information
1 parent
6aa71fc
commit 795bb00
Showing
2 changed files
with
12 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters