forked from openvinotoolkit/openvino
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GPU] Improve kv cache memory allocation efficiency (openvinotoolkit#…
…25580) ### Details: - Fixed two issues - 1) KV cache was allocating redundant memory when it requires new memory - 2) At a new inference, KV cache was setting a padding value as the one used in the previous execution (last token for the previous generation), which caused memory usage inefficiency. - After fixing above issues, in some cases, memory is more frequently allocated because - 1) switching shape 1024 => 32 : happens reclaiming (previously due to the wrong padding, it is not reclaimed.) - 2) switching shape 32 => 1024 : new alloc needed at the first infer, but shape history is not tracked yet. So during 3 iteration, it is allocating new memory. - Additional fix to resolve above issues: - 1) For initial allocation of kv cache, enforce prealloc with custom prealloc count (known value of 128 + id%64) for sequence axis - 2) For reclaiming kv cache : use prealloc size as the required memory size Memalloc count with PR ![image](https://github.com/user-attachments/assets/c65b3335-c849-46f8-b9fe-140c3a0fbccb) ### Tickets: - 146930
- Loading branch information
Showing
3 changed files
with
94 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters