Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Tool Calling in vLLM #344

Open
casper-hansen opened this issue Feb 22, 2025 · 0 comments
Open

Add support for Tool Calling in vLLM #344

casper-hansen opened this issue Feb 22, 2025 · 0 comments

Comments

@casper-hansen
Copy link

casper-hansen commented Feb 22, 2025

According to vLLM docs, you can specify tools and custom tool parsers (example below).

Why is this useful:

  • Tool calling is useful in general because it augments the model with additional data
  • We can train models to run function calling dynamically as the model is generating
    • This needs a custom tool parser, e.g. you could teach the model to call an API to retrieve additional data (see the ExampleToolParser below)

Problems with implementing this in veRL:

  • veRL uses inference_engine.generate() which does not support tool calling (only supported in chat())
    • This potentially needs support in vLLM to make it happen.
    • The main challenge is that generate currently processes raw text prompts without interpreting structured data (e.g., function calls), whereas chat converts structured messages into prompts and integrates tools.
@ToolParserManager.register_module(["example"])
class ExampleToolParser(ToolParser):
    def __init__(self, tokenizer: AnyTokenizer):
        super().__init__(tokenizer)

    # adjust request. e.g.: set skip special tokens
    # to False for tool call output.
    def adjust_request(
            self, request: ChatCompletionRequest) -> ChatCompletionRequest:
        return request

    # implement the tool call parse for stream call
    def extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest,
    ) -> Union[DeltaMessage, None]:
        return delta

    # implement the tool parse for non-stream call
    def extract_tool_calls(
        self,
        model_output: str,
        request: ChatCompletionRequest,
    ) -> ExtractedToolCallInformation:
        return ExtractedToolCallInformation(tools_called=False,
                                            tool_calls=[],
                                            content=text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant