Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Core] try to manage nccl version #3802

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,27 @@ RUN ldconfig /usr/local/cuda-12.1/compat/

WORKDIR /workspace

# install build and runtime dependencies
COPY requirements.txt requirements.txt
# install build dependencies
COPY requirements-build.txt requirements-build.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
pip install -r requirements-build.txt

# install development dependencies
COPY requirements-dev.txt requirements-dev.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements-dev.txt

# install runtime dependencies
COPY requirements.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt

#################### BASE BUILD IMAGE ####################


#################### EXTENSION BUILD IMAGE ####################
FROM dev AS build

# install build dependencies
COPY requirements-build.txt requirements-build.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements-build.txt

# install compiler cache to speed up compilation leveraging local or remote caching
RUN apt-get update -y && apt-get install -y ccache

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ requires = [
"setuptools >= 49.4.0",
"torch == 2.1.2",
"wheel",
"requests", # Required for downloading nccl library
]
build-backend = "setuptools.build_meta"

Expand Down
1 change: 1 addition & 0 deletions requirements-build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ packaging
setuptools>=49.4.0
torch==2.1.2
wheel
requests # Required for downloading nccl library
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ pynvml == 11.5.0
triton >= 2.1.0
outlines == 0.0.34
tiktoken == 0.6.0 # Required for DBRX tokenizer
git+https://github.com/vllm-project/vllm-nccl.git # manage NCCL version to be separate from PyTorch
10 changes: 9 additions & 1 deletion vllm/model_executor/parallel_utils/pynccl.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
import ctypes
import datetime
import os
import site
import sys

# ===================== import region =====================
import torch
Expand All @@ -40,7 +42,13 @@
f"Loading nccl from environment variable VLLM_NCCL_SO_PATH={so_file}")
else:
if torch.version.cuda is not None:
so_file = "libnccl.so.2"
# check if we have vllm-managed nccl
if os.path.exists(os.path.join(sys.prefix, "vllm_nccl")):
so_file = os.path.join(sys.prefix, "vllm_nccl", "libnccl.so.2")
elif os.path.exists(os.path.join(site.USER_BASE, "vllm_nccl")):
so_file = os.path.join(site.USER_BASE, "vllm_nccl", "libnccl.so.2")
else:
so_file = "libnccl.so.2"
elif torch.version.hip is not None:
so_file = "librccl.so.1"
else:
Expand Down
Loading