apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True #35582

mrsndmn · 2025-01-09T12:33:23Z

What does this PR do?

This PR fixes the simultaneous use of flags return_assistant_tokens_mask, return_tensors, in apply_chat_template method.

Is related to #28950 and corresponding PR #30650

Before fix

output = tokenizer.apply_chat_template(
    conversations,
    tokenize=True,
    padding=True,
    return_dict=True,
    return_assistant_tokens_mask=True,
    return_tensors='pt',
)

assert isinstance(output['attention_mask'], torch.Tensor)  # ok
assert isinstance(output['assistant_masks'], torch.Tensor) # not ok: type of `output['assistant_masks']` is `List`

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yonigottesman @amyeroberts @Rocketknight1

…assistant_tokens_mask flag

…ith no attention mask

… no padding token

…ing_side=right

mrsndmn · 2025-01-26T22:03:11Z

Hi! Take a look, please

Rocketknight1 · 2025-01-27T19:25:43Z

Hi @mrsndmn, I'm sorry we missed this PR! @yonigottesman can you take a look? It's okay if you don't have time - just let me know and I can take it instead

mrsndmn · 2025-02-03T10:20:58Z

Hey @yonigottesman, just checking in — did you have a chance to look at this PR? No rush, but let me know if you need anything from me!

ArthurZucker

LGTM let's merge 🤗

…_mask=True return_tensors=True (huggingface#35582) * apply_chat_template: consistent return_tensors behaviour with return_assistant_tokens_mask flag * test_chat_template_return_assistant_tokens_mask: support tokenizers with no attention mask * test_chat_template_return_assistant_tokens_mask: skip tokenizers with no padding token * test_chat_template_return_assistant_tokens_mask: force tokenizer padding_side=right --------- Co-authored-by: Eduard Allakhverdov <goncharova@airi.net> Co-authored-by: d.tarasov <d.tarasov@airi.net>

mrsndmn requested review from Rocketknight1 and ArthurZucker as code owners January 9, 2025 12:33

mrsndmn force-pushed the apply_chat_template_return_assistant_masks_return_tensors branch from 406d7bf to d79b446 Compare January 9, 2025 13:05

Eduard Allakhverdov and others added 4 commits January 9, 2025 16:46

apply_chat_template: consistent return_tensors behaviour with return_…

7756366

…assistant_tokens_mask flag

test_chat_template_return_assistant_tokens_mask: support tokenizers w…

db7e2b2

…ith no attention mask

test_chat_template_return_assistant_tokens_mask: skip tokenizers with…

10f9b59

… no padding token

test_chat_template_return_assistant_tokens_mask: force tokenizer padd…

7349d16

…ing_side=right

mrsndmn force-pushed the apply_chat_template_return_assistant_masks_return_tensors branch from eb47736 to 7349d16 Compare January 9, 2025 13:47

ArthurZucker approved these changes Feb 4, 2025

View reviewed changes

ArthurZucker merged commit 2ba040a into huggingface:main Feb 4, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True #35582

apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True #35582

mrsndmn commented Jan 9, 2025

mrsndmn commented Jan 26, 2025

Rocketknight1 commented Jan 27, 2025

mrsndmn commented Feb 3, 2025

ArthurZucker left a comment

apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True #35582

apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True #35582

Conversation

mrsndmn commented Jan 9, 2025

What does this PR do?

Before submitting

Who can review?

mrsndmn commented Jan 26, 2025

Rocketknight1 commented Jan 27, 2025

mrsndmn commented Feb 3, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment