From b4ce1ed89441da31dbe66ec3574d3b082ed6d97f Mon Sep 17 00:00:00 2001 From: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Thu, 13 Feb 2025 18:20:20 +0100 Subject: [PATCH 1/3] Update README.md --- README.md | 66 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 42 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index a299ab528..d7af7cd64 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ limitations under the License. `smolagents` is a library that enables you to run powerful agents in a few lines of code. It offers: -✨ **Simplicity**: the logic for agents fits in 1,000 lines of code (see [agents.py](https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py)). We kept abstractions to their minimal shape above raw code! +✨ **Simplicity**: the logic for agents fits in ~1,000 lines of code (see [agents.py](https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py)). We kept abstractions to their minimal shape above raw code! 🧑‍💻 **First-class support for Code Agents**. Our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via [E2B](https://e2b.dev/). @@ -137,42 +137,60 @@ model = AzureOpenAIServerModel( ``` -## Command Line Interface +## CLI -You can run agents from CLI using two commands: `smolagent` and `webagent`. `smolagent` is a generalist command to run a multi-step `CodeAgent` that can be equipped with various tools, meanwhile `webagent` is a specific web-browsing agent using [helium](https://github.com/mherrmann/helium). +You can run agents from CLI using two commands: `smolagent` and `webagent`. -**Web Browser Agent in CLI** +`smolagent` is a generalist command to run a multi-step `CodeAgent` that can be equipped with various tools. -`webagent` allows users to automate web browsing tasks. It uses the [helium](https://github.com/mherrmann/helium) library to interact with web pages and uses defined tools to browse the web. Read more about this agent [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py). - -Run the following command to get started: ```bash -webagent {YOUR_PROMPT_HERE} --model-type "LiteLLMModel" --model-id "gpt-4o" +smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7." --model-type "HfApiModel" --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" --imports "pandas numpy" --tools "web_search translation" ``` +Meanwhile `webagent` is a specific web-browsing agent using [helium](https://github.com/mherrmann/helium) (read more [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py)). + For instance: ```bash -webagent "go to xyz.com/women, get to sale section, click the first clothing item you see. Get the product details, and the price, return them. note that I'm shopping from France" +webagent "go to xyz.com/men, get to sale section, click the first clothing item you see. Get the product details, and the price, return them. note that I'm shopping from France" --model-type "LiteLLMModel" --model-id "gpt-4o" ``` -We redacted the website here, modify it with the website of your choice. - -**CodeAgent in CLI** -Use `smolagent` to run a multi-step agent with [tools](https://huggingface.co/docs/smolagents/en/reference/tools). It uses web search tool by default. -You can easily get started with `$ smolagent {YOUR_PROMPT_HERE}`. You can customize this as follows (more details [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/cli.py)). - -```bash -smolagent {YOUR_PROMPT_HERE} --model-type "HfApiModel" --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" --imports "pandas numpy" --tools "web_search translation" +## How do Code agents work? + +Our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) works mostly like classical ReAct agents - the exception being that the LLM engine writes its actions as Python code snippets. + +```mermaid +flowchart TB + Task[User Task] + Memory[agent.memory] + Generate[Generate from agent.model] + Execute[Execute Code action - Tool calls are written as functions] + Answer[Return the argument given to 'final_answer'] + + Task -->|Add task to agent.memory| Memory + + subgraph ReAct[ReAct loop] + Memory -->|Memory as chat messages| Generate + Generate -->|Parse output to extract code action| Execute + Execute -->|No call to 'final_answer' tool => Store execution logs in memory and keep running| Memory + end + + Execute -->|Call to 'final_answer' tool| Answer + + %% Styling + classDef default fill:#d4b702,stroke:#8b7701,color:#ffffff + classDef io fill:#4a5568,stroke:#2d3748,color:#ffffff + + class Task,Answer io ``` -For instance: -```bash -smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7. Allocate time according to number of public attraction in each, and optimize for distance and travel time. Bring all the public transportation options." -``` - -## Code agents? +Actions are now python code snippets. Hence, tool calls will be performed like python function calls. For instance, here is how the agent can perform web search over several websites in one single action: +```py +requests_to_search = ["gulf of mexico america", "greenland denmark", "tariffs"] +for request in requests_to_search: + print(f"Here are the search results for {request}:", web_search(request)) +``` -In our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent), the LLM engine writes its actions in code. This approach is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to calls: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls) and [reaches higher performance on difficult benchmarks](https://huggingface.co/papers/2411.01747). Head to [our high-level intro to agents](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents) to learn more on that. +Writing actions as code snippets is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to calls: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls) and [reaches higher performance on difficult benchmarks](https://huggingface.co/papers/2411.01747). Head to [our high-level intro to agents](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents) to learn more on that. Especially, since code execution can be a security concern (arbitrary code execution!), we provide options at runtime: - a secure python interpreter to run code more safely in your environment (more secure than raw code execution but still risky) From cc95538cfc0d046909b6d9d9d81ec267be1a87d2 Mon Sep 17 00:00:00 2001 From: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Fri, 14 Feb 2025 11:58:15 +0100 Subject: [PATCH 2/3] Update README.md Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d7af7cd64..7cb184087 100644 --- a/README.md +++ b/README.md @@ -190,7 +190,7 @@ for request in requests_to_search: print(f"Here are the search results for {request}:", web_search(request)) ``` -Writing actions as code snippets is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to calls: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls) and [reaches higher performance on difficult benchmarks](https://huggingface.co/papers/2411.01747). Head to [our high-level intro to agents](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents) to learn more on that. +Writing actions as code snippets is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to call: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls) and [reaches higher performance on difficult benchmarks](https://huggingface.co/papers/2411.01747). Head to [our high-level intro to agents](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents) to learn more on that. Especially, since code execution can be a security concern (arbitrary code execution!), we provide options at runtime: - a secure python interpreter to run code more safely in your environment (more secure than raw code execution but still risky) From 29a5c831e2f8bc9f8d3cb2aa26ffc2d9a1f0899c Mon Sep 17 00:00:00 2001 From: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Fri, 14 Feb 2025 11:58:21 +0100 Subject: [PATCH 3/3] Update README.md Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7cb184087..89c902bcb 100644 --- a/README.md +++ b/README.md @@ -183,7 +183,7 @@ flowchart TB class Task,Answer io ``` -Actions are now python code snippets. Hence, tool calls will be performed like python function calls. For instance, here is how the agent can perform web search over several websites in one single action: +Actions are now Python code snippets. Hence, tool calls will be performed as Python function calls. For instance, here is how the agent can perform web search over several websites in one single action: ```py requests_to_search = ["gulf of mexico america", "greenland denmark", "tariffs"] for request in requests_to_search: