Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker #5868

Closed
FennFlyer opened this issue Jun 26, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@FennFlyer
Copy link

FennFlyer commented Jun 26, 2024

Your current environment

Current Environment

Docker image: vllm/vllm-openai:v0.5.0.post1

Running as part of a Docker Compose stack. Relevant sections of my docker-compose.yaml are below. This is part of a multi-model deployment with other vLLM-based text generation/chat models running successfully behind a Traefik reverse proxy. I split out the instance running LLaVa 1.6 into its own service in the docker-compose.yaml to test the different commands it requires passed in on startup, it is the third service in the file. I have included the .env file entries as well.

###docker-compose.yaml###

services:


  reverseproxy:
    image: ${PROXY_IMAGE}
    container_name: reverseproxy
    # Enables the web UI and tells Traefik to listen to docker
    command: --api.insecure=true --providers.docker --api.dashboard=true
    ports:
      # The HTTP port
      - "80:80"
      # The Web UI (enabled by --api.insecure=true)
      - "8080:8080"
    volumes:
      # So that Traefik can listen to the Docker events
      - /var/run/docker.sock:/var/run/docker.sock
    networks: 
      - llm-net


  ## Current best solution for chat/text generation models
  ## Change GPU device_ids if necessary
  vllm-server:
    depends_on:
     - reverseproxy
    image: ${VLLM_IMAGE}
    container_name: vllm-server
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ['0']
    volumes:
      - ${MODEL_VOL}/${VLLM_MODEL_ID}:/vllm-workspace/${VLLM_MODEL_ID}
    command: ["--model", "${VLLM_MODEL_ID}", "--gpu-memory-utilization", "0.75", "--host", "0.0.0.0", "--root-path", "/vllm-server"]
    labels:
      - traefik.enable=true
      - traefik.http.routers.vllm-server.rule=PathPrefix(`/vllm-server`)
      - traefik.http.routers.vllm-server.middlewares=vllm-server-stripprefix
      - traefik.http.middlewares.vllm-server-stripprefix.stripprefix.prefixes=/vllm-server
      - traefik.http.services.vllm-server.loadbalancer.server.port=8000
    networks: 
      - llm-net
    # ports:
    #  - 8000:8000


## Testing llava serving with vllm
 ## Change GPU device_ids if necessary
  vllm-llava-server:
    depends_on:
     - reverseproxy
    image: ${VLLM_IMAGE}
    container_name: vllm-llava-server
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ['0']
    volumes:
      - ${MODEL_VOL}/${VLLM_IMAGE_MODEL_ID}:/vllm-workspace/${VLLM_IMAGE_MODEL_ID}
    command: ["--model", "${VLLM_IMAGE_MODEL_ID}", "--gpu-memory-utilization", "0.75", "--host", "0.0.0.0", "--root-path", "/vllm-llava-server",
      "--image-input-type", "pixel_values", "--image-token-id", "32000", "--image-input-shape", "1,3,336,336", "--image-feature-size", "576",
      "--chat-template", "template_llava.jinja"]
    labels:
      - traefik.enable=true
      - traefik.http.routers.vllm-llava-server.rule=PathPrefix(`/vllm-llava-server`)
      - traefik.http.routers.vllm-llava-server.middlewares=vllm-llava-server-stripprefix
      - traefik.http.middlewares.vllm-llava-server-stripprefix.stripprefix.prefixes=/vllm-llava-server
      - traefik.http.services.vllm-llava-server.loadbalancer.server.port=8000
    networks: 
      - llm-net
    # ports:
    #  - 8000:8000
###.env file###

MODEL_VOL=/home/<intermediate_paths>/models
VLLM_MODEL_ID=Meta-Llama-3-8B-Instruct
VLLM_IMAGE_MODEL_ID=llava-v1.6-mistral-7b-hf
PROXY_IMAGE=traefik
VLLM_IMAGE=vllm/vllm-openai:v0.5.0.post1

VLLM_IMAGE_MODEL_ID points to a cloned Huggingface directory from https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf (with template_llava.jinja added) that has directory structure:

###llava-v1.6-mistral-7b-hf directory structure###

config.json
generation_config.json
.git
.gitattributes
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
preprocessor_config.json
README.md
special_tokens_map.json
template_llava.jinja
tokenizer_config.json
tokenizer.json
tokenizer.model

🐛 Describe the bug

Bug description

On starting the service with docker compose --env-file .env.llava up reverseproxy vllm-llava-server, it appears to do the usual startup, but then throws a ValueError, see below for full text and STDOUT. I have included all startup values that appear to be required when instantiating a new LLM object from https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py, am I missing something from my command entry in the docker-compose.yaml?

vllm-llava-server  | INFO 06-26 18:28:25 api_server.py:177] vLLM API server version 0.5.0.post1                                                                                                                  vllm-llava-server  | INFO 06-26 18:28:25 api_server.py:178] args: Namespace(host='0.0.0.0', port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, chat_template='template_llava.jinja', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path='/vllm-llava-server', middleware=[], model='llava-v1.6-mistral-7b-hf', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.75, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, device='auto', image_input_type='pixel_values', image_token_id=32000, image_input_shape='1,3,336,336', image_feature_size=576, image_processor=None, image_processor_revision=None, disable_image_processor=False, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, model_loader_extra_config=None, preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
vllm-llava-server  | INFO 06-26 18:28:25 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='llava-v1.6-mistral-7b-hf', speculative_config=None, tokenizer='llava-v1.6-mistral-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=llava-v1.6-mistral-7b-hf)
vllm-llava-server  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
vllm-llava-server  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
vllm-llava-server  | INFO 06-26 18:29:15 model_runner.py:160] Loading model weights took 14.1020 GB
vllm-llava-server  | [rank0]: Traceback (most recent call last):
vllm-llava-server  | [rank0]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
vllm-llava-server  | [rank0]:     return _run_code(code, main_globals, None,
vllm-llava-server  | [rank0]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
vllm-llava-server  | [rank0]:     exec(code, run_globals)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 196, in <module>
vllm-llava-server  | [rank0]:     engine = AsyncLLMEngine.from_engine_args(
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 398, in from_engine_args
vllm-llava-server  | [rank0]:     engine = cls(
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 349, in __init__
vllm-llava-server  | [rank0]:     self.engine = self._init_engine(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 473, in _init_engine
vllm-llava-server  | [rank0]:     return engine_class(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 236, in __init__
vllm-llava-server  | [rank0]:     self._initialize_kv_caches()
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 313, in _initialize_kv_caches
vllm-llava-server  | [rank0]:     self.model_executor.determine_num_available_blocks())
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 75, in determine_num_available_blocks
vllm-llava-server  | [rank0]:     return self.driver_worker.determine_num_available_blocks()
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
vllm-llava-server  | [rank0]:     return func(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 162, in determine_num_available_blocks
vllm-llava-server  | [rank0]:     self.model_runner.profile_run()
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
vllm-llava-server  | [rank0]:     return func(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 844, in profile_run
vllm-llava-server  | [rank0]:     self.execute_model(seqs, kv_caches)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
vllm-llava-server  | [rank0]:     return func(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 749, in execute_model
vllm-llava-server  | [rank0]:     hidden_states = model_executable(
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
vllm-llava-server  | [rank0]:     return self._call_impl(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
vllm-llava-server  | [rank0]:     return forward_call(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llava_next.py", line 383, in forward
vllm-llava-server  | [rank0]:     image_input = self._parse_and_validate_image_input(**kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llava_next.py", line 196, in _parse_and_validate_image_input
vllm-llava-server  | [rank0]:     raise ValueError("Incorrect type of image sizes. "
vllm-llava-server  | [rank0]: ValueError: Incorrect type of image sizes. Got type: <class 'NoneType'>
vllm-llava-server exited with code 0
@FennFlyer FennFlyer added the bug Something isn't working label Jun 26, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 27, 2024

Does the model fail upon startup? Otherwise, can you provide an example OpenAI API request that triggers this error?

Can you try out #5214 and see if you get the same problem? The profile_run logic should be fixed there.

@FennFlyer
Copy link
Author

FennFlyer commented Jun 28, 2024

Sure, do you have a recommended way to build the container? Just do the usual clone and Docker build on the branch or does your team have any build magic happening that I need to know about? Right now I'm just pulling straight from Docker Hub.

@DarkLight1337
Copy link
Member

Sorry I missed this - I haven't used the Docker container myself, but from my understanding, you can use the Dockerfile from the main branch directly.

@DarkLight1337
Copy link
Member

v0.5.1 has been released so you can directly use the official Docker image now.

@FennFlyer
Copy link
Author

Thank you, I was out on holiday last week so I will test the new image ASAP!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants