Support speculative decoding #59
Labels
feature
Categorizes issue or PR as related to a new feature.
needs-priority
Indicates a PR lacks a label and requires one.
needs-triage
Indicates an issue or PR lacks a label and requires one.
Milestone
What would you like to be added:
Speculative Decoding helps to accelerate the prediction of large language models. which is supported by vllm by default.
Why is this needed:
Improve the inference throughput.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: