-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazy compute input ids #2216
lazy compute input ids #2216
Conversation
# assigns input_ids | ||
# TODO: for dynamic batching, or HF pipeline, tokenizer is applied differently. | ||
if kwargs.get("tokenizer") and kwargs.get("is_rolling_batch"): | ||
request_input.input_ids = tokenizer.encode(request_input.input_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the tokenizer at all anymore in this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed for chat_completions and bedrock use-cases to apply the chat-template.
if vllm_request_output.prompt_logprobs and not request_output.prompt_tokens_details: | ||
# TODO: Temp check adding the check fo T5. | ||
if isinstance(vllm_request_output.prompt_token_ids, list): | ||
for index, prompt_token_id in enumerate( | ||
vllm_request_output.prompt_token_ids): | ||
prompt_token = Token( | ||
id=prompt_token_id, | ||
text=tokenizer.decode([prompt_token_id]), | ||
text=tokenizer.convert_ids_to_tokens(prompt_token_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can choose to leave this as None, but this method is just a dict lookup
I ran some benchmarks to understand the impact of these changes: For all tests, I used the command:
v0.28.0 container:
current lmi-nightly without changes in this PR:
current lmi + changes in this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @siddvenk
Description
Make input ids computation to lazy, not required all the time