[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker #5868

FennFlyer · 2024-06-26T18:50:54Z

Your current environment

Current Environment

Docker image: vllm/vllm-openai:v0.5.0.post1

Running as part of a Docker Compose stack. Relevant sections of my docker-compose.yaml are below. This is part of a multi-model deployment with other vLLM-based text generation/chat models running successfully behind a Traefik reverse proxy. I split out the instance running LLaVa 1.6 into its own service in the docker-compose.yaml to test the different commands it requires passed in on startup, it is the third service in the file. I have included the .env file entries as well.

###docker-compose.yaml###

services:


  reverseproxy:
    image: ${PROXY_IMAGE}
    container_name: reverseproxy
    # Enables the web UI and tells Traefik to listen to docker
    command: --api.insecure=true --providers.docker --api.dashboard=true
    ports:
      # The HTTP port
      - "80:80"
      # The Web UI (enabled by --api.insecure=true)
      - "8080:8080"
    volumes:
      # So that Traefik can listen to the Docker events
      - /var/run/docker.sock:/var/run/docker.sock
    networks: 
      - llm-net


  ## Current best solution for chat/text generation models
  ## Change GPU device_ids if necessary
  vllm-server:
    depends_on:
     - reverseproxy
    image: ${VLLM_IMAGE}
    container_name: vllm-server
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ['0']
    volumes:
      - ${MODEL_VOL}/${VLLM_MODEL_ID}:/vllm-workspace/${VLLM_MODEL_ID}
    command: ["--model", "${VLLM_MODEL_ID}", "--gpu-memory-utilization", "0.75", "--host", "0.0.0.0", "--root-path", "/vllm-server"]
    labels:
      - traefik.enable=true
      - traefik.http.routers.vllm-server.rule=PathPrefix(`/vllm-server`)
      - traefik.http.routers.vllm-server.middlewares=vllm-server-stripprefix
      - traefik.http.middlewares.vllm-server-stripprefix.stripprefix.prefixes=/vllm-server
      - traefik.http.services.vllm-server.loadbalancer.server.port=8000
    networks: 
      - llm-net
    # ports:
    #  - 8000:8000


## Testing llava serving with vllm
 ## Change GPU device_ids if necessary
  vllm-llava-server:
    depends_on:
     - reverseproxy
    image: ${VLLM_IMAGE}
    container_name: vllm-llava-server
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ['0']
    volumes:
      - ${MODEL_VOL}/${VLLM_IMAGE_MODEL_ID}:/vllm-workspace/${VLLM_IMAGE_MODEL_ID}
    command: ["--model", "${VLLM_IMAGE_MODEL_ID}", "--gpu-memory-utilization", "0.75", "--host", "0.0.0.0", "--root-path", "/vllm-llava-server",
      "--image-input-type", "pixel_values", "--image-token-id", "32000", "--image-input-shape", "1,3,336,336", "--image-feature-size", "576",
      "--chat-template", "template_llava.jinja"]
    labels:
      - traefik.enable=true
      - traefik.http.routers.vllm-llava-server.rule=PathPrefix(`/vllm-llava-server`)
      - traefik.http.routers.vllm-llava-server.middlewares=vllm-llava-server-stripprefix
      - traefik.http.middlewares.vllm-llava-server-stripprefix.stripprefix.prefixes=/vllm-llava-server
      - traefik.http.services.vllm-llava-server.loadbalancer.server.port=8000
    networks: 
      - llm-net
    # ports:
    #  - 8000:8000

###.env file###

MODEL_VOL=/home/<intermediate_paths>/models
VLLM_MODEL_ID=Meta-Llama-3-8B-Instruct
VLLM_IMAGE_MODEL_ID=llava-v1.6-mistral-7b-hf
PROXY_IMAGE=traefik
VLLM_IMAGE=vllm/vllm-openai:v0.5.0.post1

VLLM_IMAGE_MODEL_ID points to a cloned Huggingface directory from https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf (with template_llava.jinja added) that has directory structure:

###llava-v1.6-mistral-7b-hf directory structure###

config.json
generation_config.json
.git
.gitattributes
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
preprocessor_config.json
README.md
special_tokens_map.json
template_llava.jinja
tokenizer_config.json
tokenizer.json
tokenizer.model

🐛 Describe the bug

Bug description

On starting the service with docker compose --env-file .env.llava up reverseproxy vllm-llava-server, it appears to do the usual startup, but then throws a ValueError, see below for full text and STDOUT. I have included all startup values that appear to be required when instantiating a new LLM object from https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py, am I missing something from my command entry in the docker-compose.yaml?

vllm-llava-server  | INFO 06-26 18:28:25 api_server.py:177] vLLM API server version 0.5.0.post1                                                                                                                  vllm-llava-server  | INFO 06-26 18:28:25 api_server.py:178] args: Namespace(host='0.0.0.0', port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, chat_template='template_llava.jinja', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path='/vllm-llava-server', middleware=[], model='llava-v1.6-mistral-7b-hf', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.75, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, device='auto', image_input_type='pixel_values', image_token_id=32000, image_input_shape='1,3,336,336', image_feature_size=576, image_processor=None, image_processor_revision=None, disable_image_processor=False, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, model_loader_extra_config=None, preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
vllm-llava-server  | INFO 06-26 18:28:25 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='llava-v1.6-mistral-7b-hf', speculative_config=None, tokenizer='llava-v1.6-mistral-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=llava-v1.6-mistral-7b-hf)
vllm-llava-server  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
vllm-llava-server  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
vllm-llava-server  | INFO 06-26 18:29:15 model_runner.py:160] Loading model weights took 14.1020 GB
vllm-llava-server  | [rank0]: Traceback (most recent call last):
vllm-llava-server  | [rank0]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
vllm-llava-server  | [rank0]:     return _run_code(code, main_globals, None,
vllm-llava-server  | [rank0]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
vllm-llava-server  | [rank0]:     exec(code, run_globals)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 196, in <module>
vllm-llava-server  | [rank0]:     engine = AsyncLLMEngine.from_engine_args(
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 398, in from_engine_args
vllm-llava-server  | [rank0]:     engine = cls(
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 349, in __init__
vllm-llava-server  | [rank0]:     self.engine = self._init_engine(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 473, in _init_engine
vllm-llava-server  | [rank0]:     return engine_class(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 236, in __init__
vllm-llava-server  | [rank0]:     self._initialize_kv_caches()
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 313, in _initialize_kv_caches
vllm-llava-server  | [rank0]:     self.model_executor.determine_num_available_blocks())
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 75, in determine_num_available_blocks
vllm-llava-server  | [rank0]:     return self.driver_worker.determine_num_available_blocks()
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
vllm-llava-server  | [rank0]:     return func(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 162, in determine_num_available_blocks
vllm-llava-server  | [rank0]:     self.model_runner.profile_run()
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
vllm-llava-server  | [rank0]:     return func(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 844, in profile_run
vllm-llava-server  | [rank0]:     self.execute_model(seqs, kv_caches)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
vllm-llava-server  | [rank0]:     return func(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 749, in execute_model
vllm-llava-server  | [rank0]:     hidden_states = model_executable(
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
vllm-llava-server  | [rank0]:     return self._call_impl(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
vllm-llava-server  | [rank0]:     return forward_call(*args, **kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llava_next.py", line 383, in forward
vllm-llava-server  | [rank0]:     image_input = self._parse_and_validate_image_input(**kwargs)
vllm-llava-server  | [rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llava_next.py", line 196, in _parse_and_validate_image_input
vllm-llava-server  | [rank0]:     raise ValueError("Incorrect type of image sizes. "
vllm-llava-server  | [rank0]: ValueError: Incorrect type of image sizes. Got type: <class 'NoneType'>
vllm-llava-server exited with code 0

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-06-27T08:12:25Z

~~Does the model fail upon startup? Otherwise, can you provide an example OpenAI API request that triggers this error?~~

Can you try out #5214 and see if you get the same problem? The profile_run logic should be fixed there.

FennFlyer · 2024-06-28T12:37:44Z

Sure, do you have a recommended way to build the container? Just do the usual clone and Docker build on the branch or does your team have any build magic happening that I need to know about? Right now I'm just pulling straight from Docker Hub.

DarkLight1337 · 2024-07-02T00:49:20Z

Sorry I missed this - I haven't used the Docker container myself, but from my understanding, you can use the Dockerfile from the main branch directly.

DarkLight1337 · 2024-07-06T12:57:42Z

v0.5.1 has been released so you can directly use the official Docker image now.

FennFlyer · 2024-07-08T13:47:25Z

Thank you, I was out on holiday last week so I will test the new image ASAP!

FennFlyer added the bug Something isn't working label Jun 26, 2024

DarkLight1337 assigned ywang96 Jun 27, 2024

DarkLight1337 closed this as completed Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker #5868

[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker #5868

FennFlyer commented Jun 26, 2024 •

edited

Loading

DarkLight1337 commented Jun 27, 2024 •

edited

Loading

FennFlyer commented Jun 28, 2024 •

edited

Loading

DarkLight1337 commented Jul 2, 2024

DarkLight1337 commented Jul 6, 2024

FennFlyer commented Jul 8, 2024

[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker #5868

[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker #5868

Comments

FennFlyer commented Jun 26, 2024 • edited Loading

Your current environment

🐛 Describe the bug

DarkLight1337 commented Jun 27, 2024 • edited Loading

FennFlyer commented Jun 28, 2024 • edited Loading

DarkLight1337 commented Jul 2, 2024

DarkLight1337 commented Jul 6, 2024

FennFlyer commented Jul 8, 2024

FennFlyer commented Jun 26, 2024 •

edited

Loading

DarkLight1337 commented Jun 27, 2024 •

edited

Loading

FennFlyer commented Jun 28, 2024 •

edited

Loading