-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache docs: update #32929
Cache docs: update #32929
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
docs/source/en/kv_cache.md
Outdated
>>> tokenizer = AutoTokenizer.from_pretrained(model_id) | ||
|
||
>>> INITIAL_PROMPT = "You are a helpful assistant. " | ||
>>> prompt_cache = DynamicCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather promote the use of Static here but both are fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually I recently found this doesn't work, the copy.deepcopy
fails even after making it a torch Module. @gante confirmed that it's expected to fail, so I'll remove this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we need to fix the copying issue (and add a test!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, the reason was that model forward should be run without grad, otherwise the key/values are non-leaf tensors. Fixed the example and verified it runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you have any comments, otherwise will merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On cache reuse (copying a cache object): #33297
The problem is a bit more complex than no_grad
on some cache classes :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, does that mean the current deepcopy
doesn't copy all tensors from the list when we use dynamic cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic cache should be okay with no_grad
at the moment 🤗 other caches, however, have objects that can't be copied (e.g. in the offloaded caches, the cuda stream can't be copied)
I'm writing a PR that lifts the no_grad
requirement and handles the other corner cases. And, more importantly, adds tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for addressing these issues 💪
docs/source/en/kv_cache.md
Outdated
>>> tokenizer = AutoTokenizer.from_pretrained(model_id) | ||
|
||
>>> INITIAL_PROMPT = "You are a helpful assistant. " | ||
>>> prompt_cache = DynamicCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we need to fix the copying issue (and add a test!)
b44473d
to
d4fce37
Compare
Added a slow test for cache copying |
* some changes * more updates * fix cache copy * nits * nits * add tests
What does this PR do?
Updates the docs with the feedback from community, makes some points more clear and fixes the docstring. I will run the doctest locally and see if we can trigger it by CI here
Fixes #32919