-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Text Generation][KVCacheStorage] TextGenerationPipeline
refactor
#1254
Conversation
…ralmagic/deepsparse into feature/damian/chat_pipeline
@@ -670,6 +654,9 @@ def engine_forward( | |||
if streamer is not None: | |||
streamer.end() | |||
|
|||
if self._debug: | |||
self._debug = dict(kv_cache=session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purely for testing purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even for debug we won't want to update state at runtime, we should attach this to the returned output schema if possible (does not need to be part of the schema)
TextGenerationPipeline
refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactor looks much nicer than original code! GG!
@@ -670,6 +654,9 @@ def engine_forward( | |||
if streamer is not None: | |||
streamer.end() | |||
|
|||
if self._debug: | |||
self._debug = dict(kv_cache=session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even for debug we won't want to update state at runtime, we should attach this to the returned output schema if possible (does not need to be part of the schema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits!
failures look unrelated - merging |
As agreed with the team, the old design for
KVCacheSessionStorage
was ugly, given the series of recent refactors.The goal of this PR is to decouple the
DecoderKVCache
from theNLDecoderEngine
. This will allow us to implement the upcomingChatPipeline(TextGenerationPipeline)
much more cleanly.This is roughly the design envisioned:
Testing:
test_text_generation.py