-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama3-llava-next-8b to llava_next conversion script #31395
Add llama3-llava-next-8b to llava_next conversion script #31395
Conversation
Adds support for the lmms-lab/llama3-llava-next-8b model to the convert_llava_next_weights_to_hf.py script, along with an example prompt generated from the llava_llama_3 conv_template in the LLaVA-NeXT repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks so much for adding support for new checkpoints ❤️
Just a few comments:
- Can you also add the chat format here in the docs to let users know which format to use. We don't have
chat_template
yet so it's better to indicate it in the docs for now - Weird that batched generation didn't work. Can you verify that
tokenizer.padding_side="left"
or maybe it's the incorrect prompt format? - I am wondering if we need to add the other checkpoints, of 72b and 110b size. This is more of a question to @NielsRogge
Yes would be great to have all checkpoints converted :) |
Please also make sure to verify the outputs of the model against the original implementation, also on a logits level (ideally with |
Currently doing a bunch of debugging to try to get near-identical logits. Findings so far are: I'd added After fixing the begin of text the logits are: # LLaVA-NeXT repo
tensor([[ -3.9648, 1.1396, 3.3145],
[ -5.3398, -1.5537, -1.9512],
[-12.3828, -10.6797, -9.3047]], device='cuda:0',
grad_fn=<SliceBackward0>)
# HF LlavaNextForConditionalGeneration
tensor([[ -3.9648, 1.1396, 3.3145],
[ -5.3594, -1.5654, -1.9619],
[-12.3750, -10.6797, -9.3125]], device='cuda:2', dtype=torch.float32) Which is close but not within the
|
This token gets added automatically, so it should not be included in the prompt example.
Also making a note that LLama3 and Qwen actually have extra unused space in the embeddings, so we don't need to resize. For LLama3 they are included in the tokenizer as For Qwen, the unused space isn't already allocated in the tokenizer, but |
Adds the Qwen-based LLaVA-Next models to the conversion script, along with changes to load the models on multiple GPUs for inference.
hey, i wonder why this PR is in pending. |
Sorry, totally forgot about this PR. I think the only thing left now is to get batched generation fixed, if not done yet, and create section in docs for new models, as outlined above @jamt9000 let me know if you're stuck and need help or ping me if it's ready for review :) |
I've added the prompt examples to the docs and checked that with the correct prompt and
What I'm still stuck on is:
|
@jamt9000 again sorry for delayed reply, was off on a vacation. Regarding the questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Let me know if you have bandwidth to address them, otherwise I can finish the PR for you and merge it next week :)
llama3-llava-next-8b-hf requires the following format: | ||
|
||
```bash | ||
"<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.<|eot_id|><|start_header_id|><|start_header_id|>user<|end_header_id|>\n\n<image>\nWhat is shown in this image?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" | ||
``` | ||
|
||
llava-next-72b-hf and llava-next-110b-hf require the following format: | ||
|
||
```bash | ||
"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|>\n<|im_start|>assistant\n" | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! ❤️
We're trying to move to using processor.apply_chat_template()
, so I will also add a chat_template
for this models when uploading the weights
# For these big models need to do multi-gpu inference, so reload the model with device_map="auto" in order to use accelerate | ||
# Is there a way to do this without saving and reloading the model? | ||
model.save_pretrained("/tmp/llava_qwen") | ||
model = LlavaNextForConditionalGeneration.from_pretrained("/tmp/llava_qwen", device_map="auto") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, imo it's okay to save and load the weights back. I would even say that pytorch_dump_folder_path
doesn't have to be optional, so we save it there and then load back for inference
text=[prompt, "[INST] <image>\nHow many cats are there? [/INST]"], | ||
text=[prompt, cats_prompt], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be chnaged to [prompt, prompt]
so we don't have to manually match chat templates for the second prompt
@zucchini-nlp Thanks! I'm on vacation this week, so do go ahead if you'd like to finish the PR yourself! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Ready for review, made final changes and uploaded the weights on the Hub, along with their chat templates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
Some general comments e.g. not pushing to the hub automatically
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
], | ||
required=False, | ||
) | ||
parser.add_argument( | ||
"--pytorch_dump_folder_path", default=None, type=str, help="Path to the output PyTorch model directory." | ||
"--pytorch_dump_folder_path", type=str, required=True, help="Path to the output PyTorch model directory." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to save the model and then load again, so that it's loaded by automatically inferring how to split weights on each device/cpu/disk depending on how much memory available.
Especially required in big models that we're adding, they need multi-gpu inference but we can't init model from config in multi-gpu setting
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Outdated
Show resolved
Hide resolved
…to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@amyeroberts ready for review, addresses the comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding and iterating!
Thanks @jamt9000 💛 The converted model weights are already uploaded to llava-hf org in the hub! |
What does this PR do?
Adds support for the lmms-lab/llama3-llava-next-8b model to the convert_llava_next_weights_to_hf.py script, along with an example prompt generated from the llava_llama_3 conv_template in the LLaVA-NeXT repo.
I've confirmed the model seems to convert without errors and the output seems similar (but not identical) to the inference example from the LLaVA-NeXT repo, however I'm nor sure if the logic around the added tokens is correct.
The batched generation also seems to be off:
Fixes #31394
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@NielsRogge @zucchini-nlp