Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llava: add default chat templates #31691

Merged
merged 34 commits into from
Jul 19, 2024

Conversation

zucchini-nlp
Copy link
Member

What does this PR do?

This PR adds default chat templates for Llava models, so that user-defined models on the hub do not fail when using apply_chat_template. Users will see a warning in any case that the default template is used

Chat templates were added in all llava-hf models on the hub, can be verified by the below script

messages = [{
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s the content of this image?"},
                {"type": "image"},
                ],
        },
        {
            "role": "assistant",
            "content": [{"type": "text", "text": "This picture shows a red stop sign."},]
        }
    ]
processor = AutoProcessor.from_pretrained("llava-hf/vip-llava-7b-hf")

convo = processor.apply_chat_template(messages)
>>> ###Human: <image>\nWhat’s the content of this image?###Assistant: This picture shows a red stop sign.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

Overall looks good, some nits, just needs some more clarifying language


Will create outputs like:
```
USER: <image>\nWhat is the content of this image? ASSITANT: This picture shows a red stop sign
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the image token being place before the text, when the text comes first in the content list? If this is the case, then this should be detailed above in bullet points of behaviour

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Llava models need image to be placed before text, in all cases. I thought to make the template in a way that it format correctlly even if the users pass "image" in arbitrary order

I can make it the same order as from user, then we need to update docs/readme on the hub explaining how to use the template correctly. WDYT, better to use user-order?


Will create outputs like:
```
USER: <image>\nWhat is the content of this image? ASSITANT: This picture shows a red stop sign
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same q here re image token being first

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp and others added 3 commits June 28, 2024 16:41
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a test_chat_template test to https://github.com/huggingface/transformers/blob/main/tests/models/llava/test_processor_llava.py and so on?

Just an assertion which looks like this:

prompt = "What is in this image?"

manually_formatted_prompt = f"USER: <image>\n{prompt} ASSISTANT:"

messages = ...

formatted_prompt = processor.apply_chat_template(messages, add_generation_prompt=True)

self.assertEquals(manually_formatted_prompt, formatted_prompt)

Cause I've seen that the templates are quite tricky to get 100% right at the space level.

zucchini-nlp and others added 4 commits June 28, 2024 17:41
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
@zucchini-nlp
Copy link
Member Author

The tests will be failing until we add the templates on the hub. I will ping when we decide how to tackle the breaking change and it's is ready for review

@zucchini-nlp
Copy link
Member Author

Removed default chat templates (#31733 already removed all defaults) and added a hack to load chat templates from a separate json file called "chat_template.json". We can leave it as is for 2-3 major versions and then remove, because I guess everyone will have a version with chat_template in init by then.

Ready for review @amyeroberts :)

@zucchini-nlp zucchini-nlp requested a review from amyeroberts July 10, 2024 13:11
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on adding this!

We need to be careful about how we handle backwards and future compatibility with regards to loading the chat template.

We'll also need to add general tests for the processor (perhaps in test_processor_common.py) which check the loading when a chat_template.json is or isn't present

Comment on lines 588 to 589
# load chat template from a separate json if exists
# TODO @raushan: remove this hack after a few major releases
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do for two reasons:

  • Older versions of transformers will still break as processors won't accept the chat template in the config. We can't assume people will have the newest versions installed.
  • People may have local copies of chat_template.json after this update. Even if we updated all public checkpoints on the hub to have the templates in the config, this would break things for anyone who is using local checkpoint as even custom templates.

FWIW - I think it's a lot cleaner having the chat template in a separate file anyway. They can be verbose and can easily clutter up the config files. Similar to keeping the vocab files separate for the tokenizers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I thought we could deprecate the chat_template.json when the old versions are very old to be in someone's env. I am okay with leaving chat_template.json but wouldn't that be different from what we have in LLMs?

Another option I proposed was to add the template in processor's tokenizer. If we don't want to have the template in processor.config, this will be the better option IMO. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older versions of transformers will still break as processors won't accept the chat template in the config.

Oh, also, continuing on this. We can also discuss this internally on slack later. We recently added an option for some processors to accept any kwargs and I was hoping to start integrating new kwargs for VLMs. Does this comment mean that we can't do it and will need another hack to load those kwargs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I thought we could deprecate the chat_template.json when the old versions are very old to be in someone's env

The problem is - when would this happen? There's some people who are still installing transformers v3, so we can't assume everyone will be working from the latest versions, even if it's a few months out. We can deprecate things within the code which we have full control over. Unfortunately, as the configs sit outside the library it means people can use them from the first version they're introduced until now.

Another option I proposed was to add the template in processor's tokenizer. If we don't want to have the template in processor.config, this will be the better option IMO. WDYT?

Just to be make sure we're talking about the same thing, this would mean adding into the tokenizer's config for the checkpoint e.g. the tokenizer config here for a llava checkpoint?

It's a solution, but I don't think it semantically makes sense: the tokenizer is for processing text, whereas this introduces information about images.

We recently added an option for some processors to accept any kwargs and I was hoping to start integrating new kwargs for VLMs. Does this comment mean that we can't do it and will need another hack to load those kwargs?

We definitely want this feature added for our processors, and gives us flexibility with future compatibility. It won't however fix things for older versions. My understanding is that if the chat_template gets added into the config, then older versions would still break when trying to load the config file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a solution, but I don't think it semantically makes sense: the tokenizer is for processing text, whereas this introduces information about images.

Yes, I meant that checkpoint. Now I see why it's not in tokenizer's config, then this workaround of having a separate json for templates is indeed better.

It won't however fix things for older versions. My understanding is that if the chat_template gets added into the config, then older versions would still break when trying to load the config file.

Yeah, this will be the problem in any case as we won't have compatibility with older transformers versions given how breaking are the new changes. Dealing with hub-transformers compatibility is harder than it looked like 😅 I guess for that feature of VLM processor refactor we'll have to wait a while and try to change slowly, to see users reactions...

zucchini-nlp and others added 3 commits July 12, 2024 15:47
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@NielsRogge NielsRogge requested a review from Rocketknight1 July 12, 2024 12:10
@zucchini-nlp
Copy link
Member Author

@amyeroberts I deleted a comment that it's a temporary hack, as we decoded to leave templates in their own json files. Ready for the final review

@zucchini-nlp zucchini-nlp requested a review from amyeroberts July 16, 2024 05:11
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Pretty much all nits, apart from one question when pretrained_model_name_or_path is a file

Comment on lines +638 to +639
text = reader.read()
chat_template = json.loads(text)["chat_template"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for doing this in two steps i.e. getting text then json.loads instead of json.load in the reader context directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, will simplify it. I was copying from processor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be loaded as json because it was saved as TextIOWrapper. Actually I don't know why we save it this way, was copying from processors. Maybe it has something to do with safe-saving 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, interesting. Well, good to know :) Thanks for investigating

zucchini-nlp and others added 13 commits July 17, 2024 17:22
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@zucchini-nlp zucchini-nlp merged commit b873234 into huggingface:main Jul 19, 2024
23 checks passed
ArthurZucker pushed a commit that referenced this pull request Aug 22, 2024
ArthurZucker pushed a commit that referenced this pull request Aug 22, 2024
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants