build(deps): update vllm requirement from <=0.6.3 to <=0.7.3 #335

dependabot · 2025-02-21T05:27:59Z

Updates the requirements on vllm to permit the latest version.

Release notes

v0.7.3

Highlights

🎉 253 commits from 93 contributors, including 29 new contributors!

Deepseek enhancements:

Support for DeepSeek Multi-Token Prediction, 1.69x speedup in low QPS scenarios (#12755)

AMD support: DeepSeek tunings, yielding 17% latency reduction (#13199)

Using FlashAttention3 for MLA (#12807)

Align the expert selection code path with official implementation (#13474)

Optimize moe_align_block_size for deepseek_v3 (#12850)

V1 Engine:

LoRA Support (#10957, #12883)

Logprobs and prompt logprobs support (#9880), min_p sampling support (#13191), logit_bias in v1 Sampler (#13079)

Use msgpack for core request serialization (#12918)

Pipeline parallelism support (#12996, #13353, #13472, #13417, #13315)

Metrics enhancements: GPU prefix cache hit rate % gauge (#12592), iteration_tokens_total histogram (#13288), several request timing histograms (#12644)

Initial speculative decoding support with ngrams (#12193, #13365)

Model Support

Enhancement to Qwen2.5-VL: BNB support (#12944), LoRA (#13261), Optimizations (#13155)

Support Unsloth Dynamic 4bit BnB quantization (#12974)

IBM/NASA Prithvi Geospatial model (#12830)

Support Mamba2 (Codestral Mamba) (#9292), Bamba Model (#10909)

Ultravox Model: Support v0.5 Release (#12912)

transformers backend

Enable quantization support for transformers backend (#12960)

Set torch_dtype in TransformersModel (#13088) VLM:

Implement merged multimodal processor for Mllama (#11427), GLM4V (#12449), Molmo (#12966)

Separate text-only and vision variants of the same model architecture (#13157)

Hardware Support

Pluggable platform-specific scheduler (#13161)

NVIDIA: Support nvfp4 quantization (#12784)

AMD:

Per-Token-Activation Per-Channel-Weight FP8 (#12501)

Tuning for Mixtral on MI325 and Qwen MoE on MI300 (#13503), Mixtral8x7B on MI300 (#13577)

Add intial ROCm support to V1 (#12790)

TPU: V1 Support (#13049)

Neuron: Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)

Gaudi:

Support Contiguous Cache Fetch (#12139)

Enable long-contexts + LoRA support (#12812)

Engine Feature

Add sleep and wake up endpoint and v1 support (#12987)

Add /v1/audio/transcriptions OpenAI API endpoint (#12909)

Performance

Reduce TTFT with concurrent partial prefills (#10235)

... (truncated)

Commits

ed6e907 [Bugfix] Fix deepseekv3 grouped topk error (#13474)
992e5c3 Merge similar examples in offline_inference into single basic example (#1...
b69692a [Kernel] LoRA - Refactor sgmv kernels (#13110)
a64a844 [2/n][ci] S3: Use full model path (#13564)
aa1e62d [ci] Fix spec decode test (#13600)
497bc83 [CI/Build] Use uv in the Dockerfile (#13566)
3738e6f [API Server] Add port number range validation (#13506)
0023cd2 [ROCm] MI300A compile targets deprecation (#13560)
041e294 [Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533)
9621667 [Misc] Warn if the vLLM version can't be retrieved (#13501)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [vllm](https://github.com/vllm-project/vllm) to permit the latest version. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Commits](vllm-project/vllm@v0.1.0...v0.7.3) --- updated-dependencies: - dependency-name: vllm dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot bot added the dependencies Pull requests that update a dependency file label Feb 21, 2025

dependabot bot mentioned this pull request Feb 21, 2025

build(deps): bump vllm from 0.6.3.post1 to 0.7.2 #307

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): update vllm requirement from <=0.6.3 to <=0.7.3 #335

build(deps): update vllm requirement from <=0.6.3 to <=0.7.3 #335

dependabot bot commented on behalf of github Feb 21, 2025

build(deps): update vllm requirement from <=0.6.3 to <=0.7.3 #335

Are you sure you want to change the base?

build(deps): update vllm requirement from <=0.6.3 to <=0.7.3 #335

Conversation

dependabot bot commented on behalf of github Feb 21, 2025

v0.7.3

Highlights

Model Support

Hardware Support

Engine Feature

Performance