yangyifei729 / KVSharer Public

Notifications You must be signed in to change notification settings
Fork 1
Star 23

Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''

23 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
img		img
internlm2_real_share		internlm2_real_share
llama_real_share		llama_real_share
README.md		README.md
test_internlm.ipynb		test_internlm.ipynb
test_llama.ipynb		test_llama.ipynb

Repository files navigation

KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing

Recommended software environment

python == 3.10
torch == 2.1.2
transformers >= 4.38.0
scikit-learn >= 1.0
tqdm >= 4.49.0
numpy >= 1.20.2

Description

The process of KVSharer is described in two runnable .ipynb files test_llama.ipynb and test_internlm.ipynb, detailing how to conduct strategy search and how to integrate KVSharer for inference.
The main implementation of sharing during inference can be found in the llama_real_share/cache_utils.py, internlm2_real_share/cache_utils.py where we introduce a new class called DynamicDictCache to store only a portion of the layers' KV cache.
We add the kv_cache_share_layers_map parameter in the LlamaForCausalLM and InternLM2ForCausalLM to set the sharing strategy. The implementation can be found in llama_real_share/modeling_llama_kvsharer.py, internlm2_real_share/modeling_internlm2_kvsharer.py.
We provide a wiki_demo.txt file in ./data folder for test.

Note

This repo is under construction.

About

Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages