Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in launch using a docker image #4242

Closed
1 task done
hzhaoy opened this issue Jun 12, 2024 · 1 comment · Fixed by #4461
Closed
1 task done

Error in launch using a docker image #4242

hzhaoy opened this issue Jun 12, 2024 · 1 comment · Fixed by #4461
Labels
solved This problem has been already solved

Comments

@hzhaoy
Copy link
Contributor

hzhaoy commented Jun 12, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.2.dev0

Reproduction

Dockerfile: https://github.com/hiyouga/LLaMA-Factory/blob/557891debb8a64b73eea012f99780a7b76424cd5/Dockerfile

Build Command:

docker build -f ./Dockerfile \
    --build-arg INSTALL_BNB=true \
    --build-arg INSTALL_VLLM=true \
    --build-arg INSTALL_DEEPSPEED=true \
    --build-arg PIP_INDEX=https://pypi.tuna.tsinghua.edu.cn/simple \
    -t llamafactory:latest .

docker-compose.yml

name: llm-fct

services:
  webui:
    image: llamafactory:latest
    command: ["llamafactory-cli", "webui"]
    volumes:
      - /models:/models
      - ./hf_cache:/root/.cache/huggingface/
      - ./data:/app/data
      - ./output:/app/output
    ports:
      - "7860:7860"
      - "8000:8000"
    ipc: host
    security_opt:
      - seccomp:unconfined
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: "all"
            capabilities: [gpu]
    restart: unless-stopped

Startup Command:
docker compose -f docker-compose.yml up -d

Error:
llm-fct-webui-1 | Traceback (most recent call last):
llm-fct-webui-1 | File "/usr/local/bin/llamafactory-cli", line 5, in
llm-fct-webui-1 | from llamafactory.cli import main
llm-fct-webui-1 | File "/app/src/llamafactory/init.py", line 3, in
llm-fct-webui-1 | from .cli import VERSION
llm-fct-webui-1 | File "/app/src/llamafactory/cli.py", line 7, in
llm-fct-webui-1 | from . import launcher
llm-fct-webui-1 | File "/app/src/llamafactory/launcher.py", line 1, in
llm-fct-webui-1 | from llamafactory.train.tuner import run_exp
llm-fct-webui-1 | File "/app/src/llamafactory/train/tuner.py", line 10, in
llm-fct-webui-1 | from ..model import load_model, load_tokenizer
llm-fct-webui-1 | File "/app/src/llamafactory/model/init.py", line 1, in
llm-fct-webui-1 | from .loader import load_config, load_model, load_tokenizer
llm-fct-webui-1 | File "/app/src/llamafactory/model/loader.py", line 13, in
llm-fct-webui-1 | from .patcher import patch_config, patch_model, patch_tokenizer, patch_valuehead_model
llm-fct-webui-1 | File "/app/src/llamafactory/model/patcher.py", line 16, in
llm-fct-webui-1 | from .model_utils.longlora import configure_longlora
llm-fct-webui-1 | File "/app/src/llamafactory/model/model_utils/longlora.py", line 6, in
llm-fct-webui-1 | from transformers.models.llama.modeling_llama import (
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 54, in
llm-fct-webui-1 | from flash_attn import flash_attn_func, flash_attn_varlen_func
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
llm-fct-webui-1 | from flash_attn.flash_attn_interface import (
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
llm-fct-webui-1 | import flash_attn_2_cuda as flash_attn_cuda
llm-fct-webui-1 | ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Expected behavior

Successfully started

Others

Maybe there are some solutions here oobabooga/text-generation-webui#4182
And I found that everything is fine when using nvcr.io/nvidia/pytorch:24.01-py3 as the base image instead of nvcr.io/nvidia/pytorch:24.02-py3.

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 12, 2024
@hiyouga
Copy link
Owner

hiyouga commented Jun 12, 2024

please try again with the latest docker file

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants