Skip to content

Commit

Permalink
Only change cache manager for tp>1
Browse files Browse the repository at this point in the history
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
  • Loading branch information
tdoublep committed Jul 4, 2024
1 parent b040645 commit 4dd9367
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion vllm/executor/multiproc_gpu_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ def _init_executor(self) -> None:
os.environ["OMP_NUM_THREADS"] = "1"

# workaround for https://github.com/vllm-project/vllm/issues/6103
maybe_set_triton_cache_manager()
if world_size > 1:
maybe_set_triton_cache_manager()

assert world_size <= cuda_device_count_stateless(), (
"please set tensor_parallel_size to less than max local gpu count")
Expand Down

0 comments on commit 4dd9367

Please sign in to comment.