Add assistant prefill for chat templates and TextGenerationPipeline #33198

Rocketknight1 · 2024-08-29T13:47:41Z

Something that's been requested several times both internally and on Github is assistant prefill: The ability to begin the model's response for it and let it continue.

We use a slightly hacky solution suggested by @Narsil, but which I think will work in all cases that I know of: When we do assistant prefill, we simply truncate the formatted chat to the end of the final message text, removing any tokens that indicate the end of an assistant message, which will cause the model to simply continue from wherever the final message ends.

This PR also updates TextGenerationPipeline so that if you pass a chat where the final message is an assistant message, it will assume that you want to treat the final message as an assistant prefill. Before this PR, it would start a new message if you did this, which would generally result in malformed output because most models were not trained with multiple consecutive assistant messages.

Fixes #33096

Fixes #32213

LysandreJik · 2024-08-29T13:57:45Z

Let me know when you'd like a review @Rocketknight1!

HuggingFaceDocBuilderDev · 2024-08-29T14:08:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2024-08-29T15:30:46Z

@LysandreJik should be ready for review now! I had to tweak a couple of other tests that ended in assistant messages - the output was garbage in those cases anyway, and the tests were just checking that it wasn't changing unexpectedly.

For visibility:
cc @philschmid who requested this. Cc @Narsil for TGI and @xenova for transformers.js, though note that this doesn't change Jinja behaviour at all, it just adds a flag that trims the output string in Python! Also cc @gante for generation.

See also #33096 and #32213 where this was requested

DIYer22 · 2024-08-29T16:04:51Z

Instead of assistant_prefill, may just add a new parameter called prefill_last_message to apply_chat_template. I feel like there's no need to limit prefill to the assistant role. This way, it's compatible with the current needs and can also adapt to potential future needs. Plus, the parameter name and the corresponding changes are simple and clear enough that there won't be any misunderstandings.

Rocketknight1 · 2024-08-29T16:39:51Z

@DIYer22 I'm not sure about that - the "assistant" role is universal across all chat models that I know of. Although some users might want the model to continue a user or system message, I think that's more of a fun/research thing than a common use-case, and I think it's okay to expect people to construct their own prompts/templates for those cases.

Tostino · 2024-08-29T17:17:01Z

There are other models in private that don't use standard assistant / user roles, and will likely be forced to by this type of change, or will just use something other than chat_templates for formatting their prompts (hurting standardization even more).

I don't think using a model through the standard chat/completion style endpoint for auto-complete for a users' message is a far off use case. It is just something that OpenAI doesn't support with their endpoints, so tooling hasn't been built to support it yet. That was one of the main points I was trying to enable (along with assistant prefill).

Not to mention "chat_templates" are not just for chat models, there are so many ways that the API can be used if things aren't artificially limited.

Edit: I am good with the approach in general though (not needing to modify the templates themselves is great).

Rocketknight1 · 2024-08-29T17:46:09Z

Hmn, okay, there's more demand for it than I thought - you were right @DIYer22!

@Tostino @DIYer22 I've replaced assistant_prefill with continue_final_message. It has the same behaviour as assistant_prefill, but will no longer raise an error if the final message is not an assistant message.

I've kept the TextGenerationPipeline behaviour the same, however. I'll think about how I could possibly add support there without disrupting the UX for simple use-cases.

Rocketknight1 · 2024-08-29T18:10:42Z

Update: Pipeline now correctly handles the continue_final_message kwarg, which overrides the default behaviour of continuing assistant messages only.

Rocketknight1 · 2024-08-30T12:17:47Z

@LysandreJik sorry for the delay, should be ready for actual review now!

LysandreJik

Ok makes sense! Thanks @Rocketknight1

LysandreJik · 2024-08-30T13:04:13Z

src/transformers/pipelines/text_generation.py

+                        if continue_final_message is None:
+                            continue_final_message = prompt_text.messages[-1]["role"] == "assistant"


Maybe add a comment explaining what is being done under the hood here (so assuming that the request is to continue the final message if the last message was already sent by the assistant?)

src/transformers/tokenization_utils_base.py

LysandreJik · 2024-08-30T13:06:12Z

tests/pipelines/test_pipelines_text_generation.py

+        # Here we check that passing a chat that ends in an assistant message is handled correctly
+        # by continuing the final message rather than starting a new one
+        text_generator = pipeline(
+            task="text-generation", model="rocketknight1/tiny-gpt2-with-chatml-template", framework="pt"


Can you move that model to hf-internal-testing? We're trying to move away from user/contributor/maintainer-hosted checkpoints in our CI

LysandreJik · 2024-08-30T13:06:30Z

tests/pipelines/test_pipelines_text_generation.py

+        # Here we check that passing a chat that ends in an assistant message is handled correctly
+        # by continuing the final message rather than starting a new one
+        text_generator = pipeline(
+            task="text-generation", model="rocketknight1/tiny-gpt2-with-chatml-template", framework="pt"


Co-authored-by: Lysandre Debut <hi@lysand.re>

Rocketknight1 · 2024-08-30T15:53:24Z

@LysandreJik all comments addressed/merged, and moved all the repos to hf-internal-testing. Also updated a couple of other cases where I was using personal repos for tests that were already merged to main. I also added a section to the chat templating docs about this.

Let me know if you're happy to merge, or if you want to re-review any of that!

LysandreJik

Looks good! Thanks for iterating on this @Rocketknight1

Tostino · 2024-09-02T14:42:46Z

Thank you very much for the work on this @Rocketknight1. I'm really happy you came up with a solution that helps solve all the needs brought up.

Rocketknight1 · 2024-09-02T16:23:49Z

Thanks @Tostino! Please let me know if you encounter any issues while using it - you can try it out right now by installing from main.

Tostino · 2024-09-06T02:46:18Z

@Rocketknight1 There seems to be an issue...

I was testing it the other day and just couldn't get it working...and then I needed to continue on my roadtrip. Just got back to testing and realized there is a discrepancy between the code and the documentation.

Documentation says continue_last_message and code is actually continue_final_message.

Rocketknight1 · 2024-09-06T11:34:50Z

@Tostino I have no idea how I overlooked that! I think I just changed the name of the argument quite late in the PR, opening a fix now.

…uggingface#33198) * Add assistant prefill to chat templates * Add assistant prefill to pipeline * Add assistant prefill to pipeline * Tweak another test that ended in assistant message * Update tests that ended in assistant messages * Update tests that ended in assistant messages * Replace assistant_prefill with continue_final_message * Allow passing continue_final_message to pipeline * Small fixup * Add continue_final_message as a pipeline kwarg * Update docstrings * Move repos to hf-internal-testing! * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Add explanatory comment * make fixup * Update chat templating docs to explain continue_last_message --------- Co-authored-by: Lysandre Debut <hi@lysand.re>

Rocketknight1 added 3 commits August 29, 2024 14:20

Add assistant prefill to chat templates

615a53b

Add assistant prefill to pipeline

be2da74

Add assistant prefill to pipeline

d0a2169

Rocketknight1 added 3 commits August 29, 2024 15:13

Tweak another test that ended in assistant message

3116d68

Update tests that ended in assistant messages

89e46e8

Update tests that ended in assistant messages

a4daf43

This was referenced Aug 29, 2024

Mode-aware chat templates for distinct training and inference behaviors #33096

Closed

Chat Assistant Prefill #32213

Closed

Rocketknight1 added 2 commits August 29, 2024 18:36

Replace assistant_prefill with continue_final_message

6f45c19

Allow passing continue_final_message to pipeline

e6fbac9

Rocketknight1 added 3 commits August 29, 2024 18:47

Small fixup

7cd9ad4

Add continue_final_message as a pipeline kwarg

e465142

Update docstrings

83a1874

LysandreJik approved these changes Aug 30, 2024

View reviewed changes

Rocketknight1 and others added 4 commits August 30, 2024 16:32

Move repos to hf-internal-testing!

b956b94

Update src/transformers/tokenization_utils_base.py

2758ad3

Co-authored-by: Lysandre Debut <hi@lysand.re>

Add explanatory comment

2ce6a94

make fixup

6658d07

Update chat templating docs to explain continue_last_message

e58a2d8

LysandreJik approved these changes Sep 2, 2024

View reviewed changes

Rocketknight1 merged commit 52a0213 into main Sep 2, 2024
25 checks passed

Rocketknight1 deleted the add_assistant_prefill branch September 2, 2024 12:23

eggry mentioned this pull request Dec 11, 2024

fix(HFChat): Update HFChat for compatibility with transformers>=1.45.0 ThuCCSLab/JailbreakEval#11

Merged

Originalimoc mentioned this pull request Feb 5, 2025

[Bug]: "Continue generation" request to API is missing a parameter danny-avila/LibreChat#5654

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add assistant prefill for chat templates and TextGenerationPipeline #33198

Add assistant prefill for chat templates and TextGenerationPipeline #33198

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

LysandreJik commented Aug 29, 2024

HuggingFaceDocBuilderDev commented Aug 29, 2024

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

DIYer22 commented Aug 29, 2024

Rocketknight1 commented Aug 29, 2024

Tostino commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 30, 2024

LysandreJik left a comment

LysandreJik Aug 30, 2024

Rocketknight1 Aug 30, 2024

LysandreJik Aug 30, 2024

Rocketknight1 Aug 30, 2024

LysandreJik Aug 30, 2024

Rocketknight1 Aug 30, 2024

Rocketknight1 commented Aug 30, 2024 •

edited

Loading

LysandreJik left a comment

Tostino commented Sep 2, 2024

Rocketknight1 commented Sep 2, 2024

Tostino commented Sep 6, 2024 •

edited

Loading

Rocketknight1 commented Sep 6, 2024

		if continue_final_message is None:
		continue_final_message = prompt_text.messages[-1]["role"] == "assistant"

Add assistant prefill for chat templates and TextGenerationPipeline #33198

Add assistant prefill for chat templates and TextGenerationPipeline #33198

Conversation

Rocketknight1 commented Aug 29, 2024 • edited Loading

LysandreJik commented Aug 29, 2024

HuggingFaceDocBuilderDev commented Aug 29, 2024

Rocketknight1 commented Aug 29, 2024 • edited Loading

DIYer22 commented Aug 29, 2024

Rocketknight1 commented Aug 29, 2024

Tostino commented Aug 29, 2024 • edited Loading

Rocketknight1 commented Aug 29, 2024 • edited Loading

Rocketknight1 commented Aug 29, 2024 • edited Loading

Rocketknight1 commented Aug 30, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Aug 30, 2024

Choose a reason for hiding this comment

Rocketknight1 Aug 30, 2024

Choose a reason for hiding this comment

LysandreJik Aug 30, 2024

Choose a reason for hiding this comment

Rocketknight1 Aug 30, 2024

Choose a reason for hiding this comment

LysandreJik Aug 30, 2024

Choose a reason for hiding this comment

Rocketknight1 Aug 30, 2024

Choose a reason for hiding this comment

Rocketknight1 commented Aug 30, 2024 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

Tostino commented Sep 2, 2024

Rocketknight1 commented Sep 2, 2024

Tostino commented Sep 6, 2024 • edited Loading

Rocketknight1 commented Sep 6, 2024

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

Tostino commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 29, 2024 •

edited

Loading

Rocketknight1 commented Aug 30, 2024 •

edited

Loading

Tostino commented Sep 6, 2024 •

edited

Loading