Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
comaniac committed Dec 13, 2024
1 parent 0d8451c commit 2efd86e
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions vllm/v1/core/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,13 @@ def schedule(self) -> "SchedulerOutput":
if num_new_tokens == 0:
# The happens when prompt length is divisible by the block
# size and all blocks are cached. Now we force to recompute
# the last token.
num_computed_tokens -= 1
num_new_tokens = 1
# the last block. Note that we have to re-compute an entire
# block because allocate_slots() assumes num_computed_tokens
# is always a multiple of the block size. This limitation
# can potentially be removed in the future to slightly
# improve the performance.
num_computed_tokens -= self.block_size
num_new_tokens = self.block_size
computed_blocks.pop()
num_new_tokens = min(num_new_tokens, token_budget)
assert num_new_tokens > 0
Expand Down

0 comments on commit 2efd86e

Please sign in to comment.