-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model][Format]: Support the HF-version of Pixtral #8685
Comments
Related issues: |
Do you have any suggested work arounds at this time? |
@Reichenbachian you can use the official mistral consolidated checkpoint with vllm if you want to use pixtral. As for supporting the HF format, we are still waiting on someone to contribute the implementation. |
Hey @mgoin we have a fine tuned version that has gone through the transformers library, so using the consolidated checkpoint isn't going to work for us unfortunately. If you can point me in the right direction though, I might be able to implement it |
Otherwise, we may just retrain with Llama 3.2 vision. Sorry for the OOS question, but are you aware of any similar issues there? |
Thanks @Reichenbachian we definitely want to have the implementation, for similar reasons. The key part that needs to be implemneted is registering the pixtral vision tower to the Llama 3.2 Vision is supported and should be fine for that usecase, but it is less optimized since the cross-attention architecture it uses is much less common than the Llava-style most other VLMs have been using. |
I have started a draft here where we can load the weights properly. It still needs more work to properly perform inference #9036 |
@Reichenbachian if you simply want to run your own fine-tuned version, there is this user that wrote a conversion script from HF format --> Mistral format https://github.com/spring-anth/transform_pixtral/blob/main/convert_hf_transformers_pixtral_model_to_vllm_compatible_version.py |
The model to consider.
vLLM supports mistral's "consolidated" format for the Pixtral model found at: https://huggingface.co/mistral-community/pixtral-12b-240910
However when HF implemented Pixtral in Transformers, they use a different format leveraging the existing Llava model structure. Model example: https://huggingface.co/mistral-community/pixtral-12b
HF PR reference: huggingface/transformers#33449
Supporting the HF version means we can produce quantized versions of the model with LLM Compressor
The closest model vllm already supports.
No response
What's your difficulty of supporting the model you want?
Easy to moderate, all operations should already be implemented inside of vLLM
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: