-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Set server's maximum number of generated tokens using generation_config.json #12242
Merged
+145
−9
Merged
Changes from 29 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
5c85448
Adding max_new_tokens support to generation_config.json
mhendrey 4ad6b45
Changed default_max_tokens to server_max_tokens
mhendrey 95f9c97
Renamed default_max_tokens to server_max_tokens
mhendrey 4786e56
Removed the float("inf") bug
mhendrey 4980a73
Renamed default_max_tokens to server_max_tokens
mhendrey 39d7d76
Rearranged lines to make the changes with existing as small as possible
mhendrey b6a24c4
Limit generated tokens by server's max_tokens setting when available
mhendrey aa7cff1
Changed syntax to pass format.sh tests
mhendrey 2f6e43b
[Bugfix] Fix num_heads value for simple connector when tp enabled (#1…
ShangmingCai 6baa0ea
[torch.compile] fix sym_tensor_indices (#12191)
youkaichao 35b5948
Move linting to `pre-commit` (#11975)
hmellor 0c2f332
[DOC] Fix typo in docstring and assert message (#12194)
terrytangyuan 46249e5
[DOC] Add missing docstring in LLMEngine.add_request() (#12195)
terrytangyuan 0b2e3de
[Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196)
terrytangyuan 090eca3
[Model] Add Qwen2 PRM model support (#12202)
Isotr0py 5d36c1f
[Core] Interface for accessing model from `VllmRunner` (#10353)
DarkLight1337 df331a7
[misc] add placeholder format.sh (#12206)
youkaichao 881964d
[CI/Build] Remove dummy CI steps (#12208)
DarkLight1337 5cc6a09
[CI/Build] Make pre-commit faster (#12212)
DarkLight1337 9f3d5a6
[Model] Upgrade Aria to transformers 4.48 (#12203)
DarkLight1337 957ca23
[misc] print a message to suggest how to bypass commit hooks (#12217)
youkaichao 399d224
[core][bugfix] configure env var during import vllm (#12209)
youkaichao df06503
[V1] Remove `_get_cache_block_size` (#12214)
heheda12345 b89529b
[Misc] Pass `attention` to impl backend (#12218)
wangxiyuan a5d57f1
[Bugfix] Fix `HfExampleModels.find_hf_info` (#12223)
DarkLight1337 b1af379
[CI] Pass local python version explicitly to pre-commit mypy.sh (#12224)
heheda12345 0e3a719
Added tests to check max_tokens is properly set
mhendrey 6867b37
Merge branch 'server_max_tokens'
mhendrey 99243cf
Mucked up the rebasing. Fixing that now.
mhendrey 1a15431
Reverting the serving_chat & serving_completion back and putting all …
mhendrey c10eb1f
Didn't quite revert back. Deleting empty line from both
mhendrey a3fc62b
Changed to using one-liner and edited engine arg for generation-config
mhendrey 98949f6
Merge branch 'vllm-project:main' into main
mhendrey c71f429
Converted to a one-liner for taking minimum value & added to generati…
mhendrey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
default_sampling_params
is also passed torequest.to_beam_search_params
andrequest.to_sampling_params
, let's handle this inside those methods instead.