[Doc]: Performance/Optimization Page doesn't mention Pipeline Parallel Size #12012
Closed
1 task done
Labels
documentation
Improvements or additions to documentation
📚 The doc issue
In the Page
https://github.com/vllm-project/vllm/blob/main/docs/source/performance/optimization.md
One of the recommended options includes the following:
This document does not mention increasing
pipeline_parallel_size
which would also result in the model being sharded across more GPUs so their is more memory available for KV cache.Suggest a potential alternative/fix
Increase
tensor_parallel_size
orpipeline_parallel_size
(if using Multi-Node Multi-GPU). This approach shards model weights, so each GPU has more memory available for KV cache.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: