Docs for CLI attachments, refs #587

simonw · Oct 28, 2024 · f0ed54a · f0ed54a
1 parent 570a3ec
commit f0ed54a
Show file tree

Hide file tree

Showing 4 changed files with 98 additions and 49 deletions.
diff --git a/docs/help.md b/docs/help.md
@@ -86,20 +86,37 @@ Usage: llm prompt [OPTIONS] [PROMPT]
 
   Documentation: https://llm.datasette.io/en/stable/usage.html
 
-Options:
-  -s, --system TEXT            System prompt to use
-  -m, --model TEXT             Model to use
-  -o, --option <TEXT TEXT>...  key/value options for the model
-  -t, --template TEXT          Template to use
-  -p, --param <TEXT TEXT>...   Parameters for template
-  --no-stream                  Do not stream output
-  -n, --no-log                 Don't log to database
-  --log                        Log prompt and response to the database
-  -c, --continue               Continue the most recent conversation.
-  --cid, --conversation TEXT   Continue the conversation with the given ID.
-  --key TEXT                   API key to use
-  --save TEXT                  Save prompt with this template name
-  --help                       Show this message and exit.
+  Examples:
+
+      llm 'Capital of France?'
+      llm 'Capital of France?' -m gpt-4o
+      llm 'Capital of France?' -s 'answer in Spanish'
+
+  Multi-modal models can be called with attachments like this:
+
+      llm 'Extract text from this image' -a image.jpg
+      llm 'Describe' -a https://static.simonwillison.net/static/2024/pelicans.jpg
+      cat image | llm 'describe image' -a -
+      # With an explicit content type:
+      cat image | llm 'describe image' --at - image/jpeg
+
+Options:
+  -s, --system TEXT               System prompt to use
+  -m, --model TEXT                Model to use
+  -a, --attachment ATTACHMENT     Attachment path or URL or -
+  --at, --attachment-type <TEXT TEXT>...
+                                  Attachment with explicit mimetype
+  -o, --option <TEXT TEXT>...     key/value options for the model
+  -t, --template TEXT             Template to use
+  -p, --param <TEXT TEXT>...      Parameters for template
+  --no-stream                     Do not stream output
+  -n, --no-log                    Don't log to database
+  --log                           Log prompt and response to the database
+  -c, --continue                  Continue the most recent conversation.
+  --cid, --conversation TEXT      Continue the conversation with the given ID.
+  --key TEXT                      API key to use
+  --save TEXT                     Save prompt with this template name
+  --help                          Show this message and exit.
 ```
 
 (help-chat)=

diff --git a/docs/index.md b/docs/index.md
@@ -46,10 +46,13 @@ If you have an [OpenAI API key](https://platform.openai.com/api-keys) key you ca
 # Paste your OpenAI API key into this
 llm keys set openai
 
-# Run a prompt
+# Run a prompt (with the default gpt-4o-mini model)
 llm "Ten fun names for a pet pelican"
 
-# Run a system prompt against a file
+# Extract text from an image
+llm "extract text" -a scanned-document.jpg
+
+# Use a system prompt against a file
 cat myfile.py | llm -s "Explain this code"
 ```
 Or you can {ref}`install a plugin <installing-plugins>` and use models that can run on your local device:
@@ -62,10 +65,10 @@ llm -m orca-mini-3b-gguf2-q4_0 'What is the capital of France?'
 ```
 To start {ref}`an interactive chat <usage-chat>` with a model, use `llm chat`:
 ```bash
-llm chat -m gpt-4o-mini
+llm chat -m gpt-4o
 ```
 ```
-Chatting with gpt-4o-mini
+Chatting with gpt-4o
 Type 'exit' or 'quit' to exit
 Type '!multi' to enter multiple lines, then '!end' to finish
 > Tell me a joke about a pelican

diff --git a/docs/usage.md b/docs/usage.md
@@ -45,49 +45,30 @@ Some models support options. You can pass these using `-o/--option name value` -
 ```bash
 llm 'Ten names for cheesecakes' -o temperature 1.5
 ```
+### Attachments
 
-(usage-completion-prompts)=
-## Completion prompts
-
-Some models are completion models - rather than being tuned to respond to chat style prompts, they are designed to complete a sentence or paragraph.
-
-An example of this is the `gpt-3.5-turbo-instruct` OpenAI model.
+Some models are multi-modal, which means they can accept input in more than just text. GPT-4o and GPT-4o mini can accept images, and models such as Google Gemini 1.5 can accept audio and video as well.
 
-You can prompt that model the same way as the chat models, but be aware that the prompt format that works best is likely to differ.
+LLM calls these **attachments**. You can pass attachments using the `-a` option like this:
 
 ```bash
-llm -m gpt-3.5-turbo-instruct 'Reasons to tame a wild beaver:'
+llm "describe this image" -a https://static.simonwillison.net/static/2024/pelicans.jpg
 ```
-
-(conversation)=
-## Continuing a conversation
-
-By default, the tool will start a new conversation each time you run it.
-
-You can opt to continue the previous conversation by passing the `-c/--continue` option:
+Attachments can be passed using URLs or file paths, and you can attach more than one attachment to a single prompt:
 ```bash
-llm 'More names' -c
+llm "describe these images" -a image1.jpg -a image2.jpg
 ```
-This will re-send the prompts and responses for the previous conversation as part of the call to the language model. Note that this can add up quickly in terms of tokens, especially if you are using expensive models.
-
-`--continue` will automatically use the same model as the conversation that you are continuing, even if you omit the `-m/--model` option.
-
-To continue a conversation that is not the most recent one, use the `--cid/--conversation <id>` option:
+You can also pipe an attachment to LLM by using `-` as the filename:
 ```bash
-llm 'More names' --cid 01h53zma5txeby33t1kbe3xk8q
+cat image.jpg | llm "describe this image" -a -
 ```
-You can find these conversation IDs using the `llm logs` command.
-
-## Using with a shell
-
-To learn more about your computer's operating system based on the output of `uname -a`, run this:
+LLM will attempt to automatically detect the content type of the image. If this doesn't work you can instead use the `--attachment-type` option (`--at` for short) which takes the URL/path plus an explicit content type:
 ```bash
-llm "Tell me about my operating system: $(uname -a)"
+cat myfile | llm "describe this image" --at - image/jpeg
 ```
-This pattern of using `$(command)` inside a double quoted string is a useful way to quickly assemble prompts.
 
 (system-prompts)=
-## System prompts
+### System prompts
 
 You can use `-s/--system '...'` to set a system prompt.
 ```bash
@@ -120,6 +101,46 @@ cat llm/utils.py | llm -t pytest
 ```
 See {ref}`prompt templates <prompt-templates>` for more.
 
+(conversation)=
+### Continuing a conversation
+
+By default, the tool will start a new conversation each time you run it.
+
+You can opt to continue the previous conversation by passing the `-c/--continue` option:
+```bash
+llm 'More names' -c
+```
+This will re-send the prompts and responses for the previous conversation as part of the call to the language model. Note that this can add up quickly in terms of tokens, especially if you are using expensive models.
+
+`--continue` will automatically use the same model as the conversation that you are continuing, even if you omit the `-m/--model` option.
+
+To continue a conversation that is not the most recent one, use the `--cid/--conversation <id>` option:
+```bash
+llm 'More names' --cid 01h53zma5txeby33t1kbe3xk8q
+```
+You can find these conversation IDs using the `llm logs` command.
+
+### Tips for using LLM with Bash or Zsh
+
+To learn more about your computer's operating system based on the output of `uname -a`, run this:
+```bash
+llm "Tell me about my operating system: $(uname -a)"
+```
+This pattern of using `$(command)` inside a double quoted string is a useful way to quickly assemble prompts.
+
+(usage-completion-prompts)=
+### Completion prompts
+
+Some models are completion models - rather than being tuned to respond to chat style prompts, they are designed to complete a sentence or paragraph.
+
+An example of this is the `gpt-3.5-turbo-instruct` OpenAI model.
+
+You can prompt that model the same way as the chat models, but be aware that the prompt format that works best is likely to differ.
+
+```bash
+llm -m gpt-3.5-turbo-instruct 'Reasons to tame a wild beaver:'
+```
+
 (usage-chat)=
 
 ## Starting an interactive chat

diff --git a/llm/cli.py b/llm/cli.py
@@ -221,7 +221,15 @@ def prompt(
         llm 'Capital of France?'
         llm 'Capital of France?' -m gpt-4o
         llm 'Capital of France?' -s 'answer in Spanish'
+
+    Multi-modal models can be called with attachments like this:
+
+    \b
         llm 'Extract text from this image' -a image.jpg
+        llm 'Describe' -a https://static.simonwillison.net/static/2024/pelicans.jpg
+        cat image | llm 'describe image' -a -
+        # With an explicit mimetype:
+        cat image | llm 'describe image' --at - image/jpeg
     """
     if log and no_log:
         raise click.ClickException("--log and --no-log are mutually exclusive")
@@ -356,7 +364,7 @@ def read_prompt():
 
     try:
         response = prompt_method(
-            prompt, *resolved_attachments, system=system, **validated_options
+            prompt, attachments=resolved_attachments, system=system, **validated_options
         )
         if should_stream:
             for chunk in response: