[Chat] Add Chat from TRL 🐈 (huggingface#35714)

* tmp commit * add working chat * add docts * docs 2 * use auto dtype by default
elvircrn · Feb 13, 2025 · 9929efc · 9929efc
1 parent 87a49ed
commit 9929efc
Show file tree

Hide file tree

Showing 7 changed files with 695 additions and 103 deletions.
diff --git a/docs/source/en/chat_templating.md b/docs/source/en/chat_templating.md
diff --git a/docs/source/en/generation_strategies.md b/docs/source/en/generation_strategies.md
@@ -41,6 +41,13 @@ This guide describes:
 * common decoding strategies and their main parameters
 * saving and sharing custom generation configurations with your fine-tuned model on 🤗 Hub
 
+<Tip>
+
+`generate()` is a critical component of our [`transformers-cli chat` CLI](quicktour#chat-with-text-generation-models).
+You can apply the learnings of this guide there as well.
+
+</Tip>
+
 ## Default text generation configuration
 
 A decoding strategy for a model is defined in its generation configuration. When using pre-trained models for inference

diff --git a/docs/source/en/llm_tutorial.md b/docs/source/en/llm_tutorial.md
@@ -23,6 +23,12 @@ LLMs, or Large Language Models, are the key component behind text generation. In
 
 Autoregressive generation is the inference-time procedure of iteratively calling a model with its own generated outputs, given a few initial inputs. In 🤗 Transformers, this is handled by the [`~generation.GenerationMixin.generate`] method, which is available to all models with generative capabilities.
 
+<Tip>
+
+If you want to jump straight to chatting with a model, [try our `transformers-cli chat` CLI](quicktour#chat-with-text-generation-models).
+
+</Tip>
+
 This tutorial will show you how to:
 
 * Generate text with an LLM

diff --git a/docs/source/en/quicktour.md b/docs/source/en/quicktour.md
@@ -553,6 +553,32 @@ All models are a standard [`tf.keras.Model`](https://www.tensorflow.org/api_docs
    >>> model.fit(tf_dataset)  # doctest: +SKIP
    ```
 
+
+## Chat with text generation models
+
+If you're working with a model that generates text as an output, you can also engage in a multi-turn conversation with
+it through the `transformers-cli chat` command. This is the fastest way to interact with a model, e.g. for a
+qualitative assessment (aka vibe check).
+
+This CLI is implemented on top of our `AutoClass` abstraction, leveraging our [text generation](llm_tutorial.md) and
+[chat](chat_templating.md) tooling, and thus will be compatible with any 🤗 Transformers model. If you have the library
+[installed](installation.md), you can launch the chat session on your terminal with
+
+```
+transformers-cli chat --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct
+```
+
+For a full list of options to launch the chat, type
+
+```
+transformers-cli chat -h
+```
+
+After the chat is launched, you will enter an interactive session with the model. There are special commands for this
+session as well, such as `clear` to reset the conversation. Type `help` at any moment to display all special chat
+commands, and `exit` to terminate the session.
+
+
 ## What's next?
 
 Now that you've completed the 🤗 Transformers quick tour, check out our guides and learn how to do more specific things like writing a custom model, fine-tuning a model for a task, and how to train a model with a script. If you're interested in learning more about 🤗 Transformers core concepts, grab a cup of coffee and take a look at our Conceptual Guides!