About watermarks of physical memory allocation #263
-
In current verison of codes(0.1.1), I noticed that both the can_allocate() method and can_swap_in() method of the BlockSpaceManager class deal with watermarks while can_append_slot() doesn't. It seems that they should have the same mechanism on GPU memory management. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@LinPoly Thanks for the question! This is not a bug. The watermark is to prevent frequent preemptions (i.e., swapping or recomputation) which can be caused by accepting too many new requests in the batch. For the existing requests in the batch, we want them to use every slot in the KV cache. |
Beta Was this translation helpful? Give feedback.
@LinPoly Thanks for the question! This is not a bug. The watermark is to prevent frequent preemptions (i.e., swapping or recomputation) which can be caused by accepting too many new requests in the batch. For the existing requests in the batch, we want them to use every slot in the KV cache.