Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model][Format]: Support the HF-version of Pixtral #8685

Closed
1 task done
mgoin opened this issue Sep 21, 2024 · 8 comments · Fixed by #9036
Closed
1 task done

[New Model][Format]: Support the HF-version of Pixtral #8685

mgoin opened this issue Sep 21, 2024 · 8 comments · Fixed by #9036
Labels
new model Requests to new models

Comments

@mgoin
Copy link
Member

mgoin commented Sep 21, 2024

The model to consider.

vLLM supports mistral's "consolidated" format for the Pixtral model found at: https://huggingface.co/mistral-community/pixtral-12b-240910

However when HF implemented Pixtral in Transformers, they use a different format leveraging the existing Llava model structure. Model example: https://huggingface.co/mistral-community/pixtral-12b

HF PR reference: huggingface/transformers#33449

Supporting the HF version means we can produce quantized versions of the model with LLM Compressor

The closest model vllm already supports.

No response

What's your difficulty of supporting the model you want?

Easy to moderate, all operations should already be implemented inside of vLLM

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@mgoin mgoin added the new model Requests to new models label Sep 21, 2024
@mgoin
Copy link
Member Author

mgoin commented Sep 21, 2024

@Reichenbachian
Copy link

Do you have any suggested work arounds at this time?

@mgoin
Copy link
Member Author

mgoin commented Sep 30, 2024

@Reichenbachian you can use the official mistral consolidated checkpoint with vllm if you want to use pixtral.

As for supporting the HF format, we are still waiting on someone to contribute the implementation.

@Reichenbachian
Copy link

Hey @mgoin we have a fine tuned version that has gone through the transformers library, so using the consolidated checkpoint isn't going to work for us unfortunately. If you can point me in the right direction though, I might be able to implement it

@Reichenbachian
Copy link

Otherwise, we may just retrain with Llama 3.2 vision. Sorry for the OOS question, but are you aware of any similar issues there?

@mgoin
Copy link
Member Author

mgoin commented Oct 1, 2024

Thanks @Reichenbachian we definitely want to have the implementation, for similar reasons. The key part that needs to be implemneted is registering the pixtral vision tower to the _init_vision_tower function in llava.py

Llama 3.2 Vision is supported and should be fine for that usecase, but it is less optimized since the cross-attention architecture it uses is much less common than the Llava-style most other VLMs have been using.

@mgoin
Copy link
Member Author

mgoin commented Oct 3, 2024

I have started a draft here where we can load the weights properly. It still needs more work to properly perform inference #9036

@mgoin
Copy link
Member Author

mgoin commented Oct 4, 2024

@Reichenbachian if you simply want to run your own fine-tuned version, there is this user that wrote a conversion script from HF format --> Mistral format https://github.com/spring-anth/transform_pixtral/blob/main/convert_hf_transformers_pixtral_model_to_vllm_compatible_version.py
So you could just have an extra pass to convert your model to the supported format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants