-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add assistant prefill for chat templates and TextGenerationPipeline #33198
Conversation
Let me know when you'd like a review @Rocketknight1! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@LysandreJik should be ready for review now! I had to tweak a couple of other tests that ended in assistant messages - the output was garbage in those cases anyway, and the tests were just checking that it wasn't changing unexpectedly. For visibility: |
Instead of |
@DIYer22 I'm not sure about that - the "assistant" role is universal across all chat models that I know of. Although some users might want the model to continue a user or system message, I think that's more of a fun/research thing than a common use-case, and I think it's okay to expect people to construct their own prompts/templates for those cases. |
There are other models in private that don't use standard I don't think using a model through the standard Not to mention "chat_templates" are not just for chat models, there are so many ways that the API can be used if things aren't artificially limited. Edit: I am good with the approach in general though (not needing to modify the templates themselves is great). |
Hmn, okay, there's more demand for it than I thought - you were right @DIYer22! @Tostino @DIYer22 I've replaced I've kept the |
Update: Pipeline now correctly handles the |
@LysandreJik sorry for the delay, should be ready for actual review now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense! Thanks @Rocketknight1
if continue_final_message is None: | ||
continue_final_message = prompt_text.messages[-1]["role"] == "assistant" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining what is being done under the hood here (so assuming that the request is to continue the final message if the last message was already sent by the assistant?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
# Here we check that passing a chat that ends in an assistant message is handled correctly | ||
# by continuing the final message rather than starting a new one | ||
text_generator = pipeline( | ||
task="text-generation", model="rocketknight1/tiny-gpt2-with-chatml-template", framework="pt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move that model to hf-internal-testing
? We're trying to move away from user/contributor/maintainer-hosted checkpoints in our CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
# Here we check that passing a chat that ends in an assistant message is handled correctly | ||
# by continuing the final message rather than starting a new one | ||
text_generator = pipeline( | ||
task="text-generation", model="rocketknight1/tiny-gpt2-with-chatml-template", framework="pt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Co-authored-by: Lysandre Debut <hi@lysand.re>
@LysandreJik all comments addressed/merged, and moved all the repos to Let me know if you're happy to merge, or if you want to re-review any of that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for iterating on this @Rocketknight1
Thank you very much for the work on this @Rocketknight1. I'm really happy you came up with a solution that helps solve all the needs brought up. |
Thanks @Tostino! Please let me know if you encounter any issues while using it - you can try it out right now by installing from |
@Rocketknight1 There seems to be an issue... I was testing it the other day and just couldn't get it working...and then I needed to continue on my roadtrip. Just got back to testing and realized there is a discrepancy between the code and the documentation. Documentation says |
@Tostino I have no idea how I overlooked that! I think I just changed the name of the argument quite late in the PR, opening a fix now. |
…uggingface#33198) * Add assistant prefill to chat templates * Add assistant prefill to pipeline * Add assistant prefill to pipeline * Tweak another test that ended in assistant message * Update tests that ended in assistant messages * Update tests that ended in assistant messages * Replace assistant_prefill with continue_final_message * Allow passing continue_final_message to pipeline * Small fixup * Add continue_final_message as a pipeline kwarg * Update docstrings * Move repos to hf-internal-testing! * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Add explanatory comment * make fixup * Update chat templating docs to explain continue_last_message --------- Co-authored-by: Lysandre Debut <hi@lysand.re>
Something that's been requested several times both internally and on Github is assistant prefill: The ability to begin the model's response for it and let it continue.
We use a slightly hacky solution suggested by @Narsil, but which I think will work in all cases that I know of: When we do assistant prefill, we simply truncate the formatted chat to the end of the final message text, removing any tokens that indicate the end of an assistant message, which will cause the model to simply continue from wherever the final message ends.
This PR also updates
TextGenerationPipeline
so that if you pass a chat where the final message is an assistant message, it will assume that you want to treat the final message as an assistant prefill. Before this PR, it would start a new message if you did this, which would generally result in malformed output because most models were not trained with multiple consecutive assistant messages.Fixes #33096
Fixes #32213