diff --git a/docs/source/en/chat_templating.md b/docs/source/en/chat_templating.md index 01ff5fed46e1..f65dbc016e16 100644 --- a/docs/source/en/chat_templating.md +++ b/docs/source/en/chat_templating.md @@ -196,7 +196,7 @@ Not all models require generation prompts. Some models, like LLaMA, don't have a special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact effect that `add_generation_prompt` has will depend on the template being used. -## What does "continue_last_message" do? +## What does "continue_final_message" do? When passing a list of messages to `apply_chat_template` or `TextGenerationPipeline`, you can choose to format the chat so the model will continue the final message in the chat instead of starting a new one. This is done @@ -211,7 +211,7 @@ chat = [ {"role": "assistant", "content": '{"name": "'}, ] -formatted_chat = tokenizer.apply_chat_template(chat, tokenize=True, return_dict=True, continue_last_message=True) +formatted_chat = tokenizer.apply_chat_template(chat, tokenize=True, return_dict=True, continue_final_message=True) model.generate(**formatted_chat) ``` @@ -219,7 +219,7 @@ The model will generate text that continues the JSON string, rather than startin can be very useful for improving the accuracy of the model's instruction-following when you know how you want it to start its replies. -Because `add_generation_prompt` adds the tokens that start a new message, and `continue_last_message` removes any +Because `add_generation_prompt` adds the tokens that start a new message, and `continue_final_message` removes any end-of-message tokens from the final message, it does not make sense to use them together. As a result, you'll get an error if you try! @@ -228,7 +228,7 @@ get an error if you try! The default behaviour of `TextGenerationPipeline` is to set `add_generation_prompt=True` so that it starts a new message. However, if the final message in the input chat has the "assistant" role, it will assume that this message is a prefill and switch to `continue_final_message=True` instead, because most models do not support multiple -consecutive assistant messages. You can override this behaviour by explicitly passing the `continue_last_message` +consecutive assistant messages. You can override this behaviour by explicitly passing the `continue_final_message` argument when calling the pipeline.