support janus model #1140

eaidova · 2025-02-04T07:37:34Z

What does this PR do?

conversion required fix on optimum side: huggingface/optimum#2179

from io import BytesIO
from pathlib import Path

import requests
from janus.models import VLChatProcessor
from PIL import Image
from transformers import TextStreamer

from optimum.intel.openvino import OVModelForVisualCausalLM

model_id = "deepseek-ai/Janus-Pro-1B"

model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)

processor = VLChatProcessor.from_pretrained(model_id)

Multimodal understanding

input_prompt = "Describe image in details"
image_path = Path("cat_in_box.png")

if not image_path.exists():
    response = requests.get(
        "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
    )
    image = Image.open(BytesIO(response.content)).convert("RGB")
    image.save(image_path)

image = Image.open(image_path)

inputs = model.preprocess_inputs(input_prompt, image, processor)
streamer = TextStreamer(processor.tokenizer, skip_prompt=True, skip_special_tokens=True)

model.generate(**inputs, streamer=streamer, max_new_tokens=100, do_sample=False)

Answer:

The image shows a gray tabby cat lying inside an open cardboard box on a light-colored carpet. The cat is lying on its back with its belly exposed, legs up in the air, and its tail curled around its body. The background includes a beige couch and a bright, airy room with natural light streaming in, creating a cozy and relaxed atmosphere.

Text to Image generation

image_gen_prompt = "A cute and adorable baby fox with big brown eyes, autumn leaves in the background enchanting,immortal,fluffy, shiny mane,Petals,fairyism,unreal engine 5 and Octane Render,highly detailed, photorealistic, cinematic, natural colors."

images = model.generate_image(processor, image_gen_prompt, parallel_size=1)

images[0].save("fox.png")

Generated Image

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

optimum/exporters/openvino/__main__.py

optimum/exporters/openvino/convert.py

tests/openvino/utils_tests.py

HuggingFaceDocBuilderDev · 2025-02-04T07:48:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/exporters/openvino/convert.py

AlexKoff88 · 2025-02-06T06:15:24Z

I wonder why we need to keep VLChatProcessor instance outside the model class and if we can move it inside?

eaidova · 2025-02-06T08:13:22Z

I wonder why we need to keep VLChatProcessor instance outside the model class and if we can move it inside?

not sure that I understand your question. This is standard preprocessing/postprocessing part for transformers-based models (like any other stuff - tokenizers, feature_extractors, image processors, e.t.c), usually it is an independent object (except diffusers case). It may be helpful for VLM models to move it closer as it becomes more complicated and bounded. So possibly we can consider keeping processors for other models as well (it may be helpful for alignment result of save_pretrained and optimum-cli, which also save processors and tokenizers if they are available)

AlexKoff88 · 2025-02-06T09:17:29Z

model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)

processor = VLChatProcessor.from_pretrained(model_id)

To clarity, I am just looking at the code in the PR description and wondering why it could not look like this:

model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)
...
inputs = model.preprocess_inputs(input_prompt, image)
streamer = TextStreamer(model.tokenizer, skip_prompt=True, skip_special_tokens=True)
model.generate(**inputs, streamer=streamer, max_new_tokens=100, do_sample=False)
...
images = model.generate_image(image_gen_prompt, parallel_size=1)

So, processor is loaded inside the model and hidden from the user but it can be acquired like model.processor.

But from what I understood your implementation is aligned with diffusers, right?

eaidova · 2025-02-12T16:25:20Z

@IlyasMoutawwakil @echarlaix could you please take a look?

optimum/exporters/openvino/model_configs.py

optimum/exporters/openvino/utils.py

optimum/exporters/openvino/convert.py

optimum/exporters/openvino/model_configs.py

echarlaix

Thanks for the addition @eaidova !

eaidova commented Feb 4, 2025

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

eaidova commented Feb 4, 2025

View reviewed changes

optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved

eaidova commented Feb 4, 2025

View reviewed changes

tests/openvino/utils_tests.py Outdated Show resolved Hide resolved

eaidova force-pushed the ea/janus branch from a858eb9 to 31507cc Compare February 4, 2025 07:42

eaidova force-pushed the ea/janus branch from f773a94 to 40ef93f Compare February 4, 2025 08:03

eaidova commented Feb 4, 2025

View reviewed changes

optimum/exporters/openvino/convert.py Show resolved Hide resolved

eaidova commented Feb 4, 2025

View reviewed changes

optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved

eaidova force-pushed the ea/janus branch 4 times, most recently from 9113be6 to fe6dac8 Compare February 5, 2025 15:50

eaidova requested review from nikita-savelyevv and AlexKoff88 February 5, 2025 16:28

AlexKoff88 approved these changes Feb 6, 2025

View reviewed changes

eaidova requested review from IlyasMoutawwakil and echarlaix February 6, 2025 16:30

eaidova force-pushed the ea/janus branch from fe6dac8 to c4938ac Compare February 13, 2025 10:15