-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Use vllm chat object #2659
Merged
Merged
Changes from 4 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
14c1d6d
[python] Use vllm chat object
xyang16 d1ca2b3
Update
xyang16 eb9ff28
Update
xyang16 16a51fd
[python] Use vllm chat object
xyang16 545293f
pass all mm_data from parsed chat to request input
siddvenk 7fe1755
use max_tokens instead of max_new_tokens in test chat client
siddvenk bc74b52
Add use_vllm_chat_completions()
xyang16 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
24 changes: 24 additions & 0 deletions
24
engines/python/setup/djl_python/chat_completions/vllm_chat_properties.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/usr/bin/env python | ||
# | ||
# Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file | ||
# except in compliance with the License. A copy of the License is located at | ||
# | ||
# http://aws.amazon.com/apache2.0/ | ||
# | ||
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" | ||
# BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for | ||
# the specific language governing permissions and limitations under the License. | ||
from typing import Optional | ||
from pydantic import Field | ||
from vllm.entrypoints.openai.protocol import ChatCompletionRequest | ||
|
||
|
||
class ChatProperties(ChatCompletionRequest): | ||
""" | ||
Chat input parameters for chat completions API. | ||
See https://platform.openai.com/docs/api-reference/chat/create | ||
""" | ||
|
||
model: Optional[str] = Field(default=None, exclude=True) # Unused |
81 changes: 81 additions & 0 deletions
81
engines/python/setup/djl_python/chat_completions/vllm_chat_utils.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
#!/usr/bin/env python | ||
# | ||
# Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file | ||
# except in compliance with the License. A copy of the License is located at | ||
# | ||
# http://aws.amazon.com/apache2.0/ | ||
# | ||
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" | ||
# BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for | ||
# the specific language governing permissions and limitations under the License. | ||
from typing import Dict, List, Optional, Union | ||
|
||
from djl_python.chat_completions.vllm_chat_properties import ChatProperties | ||
from djl_python.properties_manager.properties import Properties | ||
from vllm.entrypoints.chat_utils import (ChatCompletionMessageParam, | ||
apply_hf_chat_template, | ||
apply_mistral_chat_template, | ||
parse_chat_messages) | ||
|
||
|
||
def is_chat_completions_request(inputs: Dict) -> bool: | ||
return "messages" in inputs | ||
|
||
|
||
def parse_chat_completions_request_vllm( | ||
input_map: Dict, | ||
is_rolling_batch: bool, | ||
rolling_batch, | ||
tokenizer, | ||
chat_template: Optional[str] = None, | ||
image_token: Optional[str] = None, | ||
configs: Properties = None, | ||
is_mistral_tokenizer: bool = False, | ||
): | ||
# Chat completions can either be a rolling batch or no-batching . | ||
if not (is_rolling_batch or configs.batch_size == 1): | ||
raise ValueError( | ||
"chat completions support is not currently available for dynamic batching. " | ||
"You must enable rolling batch to use the chat completions format." | ||
) | ||
|
||
if not is_mistral_tokenizer and not hasattr(tokenizer, | ||
"apply_chat_template"): | ||
raise AttributeError( | ||
f"Cannot provide chat completion for tokenizer: {tokenizer.__class__}, " | ||
f"please ensure that your tokenizer supports chat templates.") | ||
|
||
chat_params = ChatProperties(**input_map) | ||
exclude = {"messages"} | ||
param = chat_params.model_dump(exclude_none=True, exclude=exclude) | ||
|
||
conversation, mm_data = parse_chat_messages( | ||
chat_params.messages, rolling_batch.get_model_config(), tokenizer) | ||
|
||
prompt_data: Union[str, List[int]] | ||
if is_mistral_tokenizer: | ||
text_inputs = apply_mistral_chat_template( | ||
tokenizer, | ||
messages=chat_params.messages, | ||
chat_template=chat_template, | ||
add_generation_prompt=True, | ||
) | ||
else: | ||
text_inputs = apply_hf_chat_template( | ||
tokenizer, | ||
conversation=conversation, | ||
chat_template=chat_template, | ||
add_generation_prompt=True, | ||
) | ||
|
||
param["details"] = True # Enable details for chat completions | ||
param[ | ||
"output_formatter"] = "jsonlines_chat" if chat_params.stream else "json_chat" | ||
|
||
if mm_data: | ||
param.update(mm_data) | ||
|
||
# In the case of mistral, text_inputs = List[TokenIds], else = str | ||
return text_inputs, param |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to base this choice of the config
option.rolling_batch=x
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.rolling_batch may be auto, this will be lmi-dist or trtllm depends on which container it is. So it's hard to tell which rolling batch it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could set a config within the RB class like
use_vllm_chat_completions
? I think I would prefer that since i'm not sure whether using VllmRollingBatch with Neuron (a valid use case) supports some of the utilities we are using from vllm since we're pulling those neuron's vllm repoThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Added use_vllm_chat_completions()