[New Model][Format]: Support the HF-version of Pixtral #8685

mgoin · 2024-09-21T00:14:39Z

The model to consider.

vLLM supports mistral's "consolidated" format for the Pixtral model found at: https://huggingface.co/mistral-community/pixtral-12b-240910

However when HF implemented Pixtral in Transformers, they use a different format leveraging the existing Llava model structure. Model example: https://huggingface.co/mistral-community/pixtral-12b

HF PR reference: huggingface/transformers#33449

Supporting the HF version means we can produce quantized versions of the model with LLM Compressor

The closest model vllm already supports.

No response

What's your difficulty of supporting the model you want?

Easy to moderate, all operations should already be implemented inside of vLLM

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

mgoin · 2024-09-21T00:15:25Z

Related issues:

[Feature]: Offline quantization for Pixtral-12B #8566

Reichenbachian · 2024-09-30T17:03:04Z

Do you have any suggested work arounds at this time?

mgoin · 2024-09-30T18:00:06Z

@Reichenbachian you can use the official mistral consolidated checkpoint with vllm if you want to use pixtral.

As for supporting the HF format, we are still waiting on someone to contribute the implementation.

Reichenbachian · 2024-09-30T18:01:03Z

Hey @mgoin we have a fine tuned version that has gone through the transformers library, so using the consolidated checkpoint isn't going to work for us unfortunately. If you can point me in the right direction though, I might be able to implement it

Reichenbachian · 2024-09-30T18:02:49Z

Otherwise, we may just retrain with Llama 3.2 vision. Sorry for the OOS question, but are you aware of any similar issues there?

mgoin · 2024-10-01T17:56:42Z

Thanks @Reichenbachian we definitely want to have the implementation, for similar reasons. The key part that needs to be implemneted is registering the pixtral vision tower to the _init_vision_tower function in llava.py

Llama 3.2 Vision is supported and should be fine for that usecase, but it is less optimized since the cross-attention architecture it uses is much less common than the Llava-style most other VLMs have been using.

mgoin · 2024-10-03T07:09:29Z

I have started a draft here where we can load the weights properly. It still needs more work to properly perform inference #9036

mgoin · 2024-10-04T17:00:11Z

@Reichenbachian if you simply want to run your own fine-tuned version, there is this user that wrote a conversion script from HF format --> Mistral format https://github.com/spring-anth/transform_pixtral/blob/main/convert_hf_transformers_pixtral_model_to_vllm_compatible_version.py
So you could just have an extra pass to convert your model to the supported format

mgoin added the new model Requests to new models label Sep 21, 2024

mgoin mentioned this issue Sep 23, 2024

[Feature]: Offline quantization for Pixtral-12B #8566

Closed

1 task

spring-anth mentioned this issue Sep 25, 2024

[Feature]: LoRA support for Pixtral #8802

Open

1 task

DarkLight1337 mentioned this issue Oct 4, 2024

[Model] Support Pixtral models in the HF Transformers format #9036

Merged

mgoin closed this as completed in #9036 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model][Format]: Support the HF-version of Pixtral #8685

[New Model][Format]: Support the HF-version of Pixtral #8685

mgoin commented Sep 21, 2024

mgoin commented Sep 21, 2024

Reichenbachian commented Sep 30, 2024

mgoin commented Sep 30, 2024

Reichenbachian commented Sep 30, 2024

Reichenbachian commented Sep 30, 2024

mgoin commented Oct 1, 2024

mgoin commented Oct 3, 2024

mgoin commented Oct 4, 2024

[New Model][Format]: Support the HF-version of Pixtral #8685

[New Model][Format]: Support the HF-version of Pixtral #8685

Comments

mgoin commented Sep 21, 2024

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

mgoin commented Sep 21, 2024

Reichenbachian commented Sep 30, 2024

mgoin commented Sep 30, 2024

Reichenbachian commented Sep 30, 2024

Reichenbachian commented Sep 30, 2024

mgoin commented Oct 1, 2024

mgoin commented Oct 3, 2024

mgoin commented Oct 4, 2024