Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix contrastive search to correctly handle input with padding #33507

Merged

Conversation

ducviet00
Copy link
Contributor

@ducviet00 ducviet00 commented Sep 16, 2024

What does this PR do?

This PR fixes contrastive search to correctly handle input padding in decoder-only & encoder-decoder models.

Details

I encountered the issue and discovered that the Contrastive Search implementation is highly sensitive to padded tokens.

For example, with the prompt: The whispered legends of the haunted mansion spoke, the output from Hugging Face without padding is:

  • The whispered legends of the haunted mansion spoke of the "souls of the dead" who were "falling out of the sky" and "falling into the sea."\n

However, with padding tokens, the output becomes:

  • The whispered legends of the haunted mansion spoke of the "soul of the dead" and the "blood of the dead."\nThe ghost of Dr. H. P. Lovecraft was a man

You can check the Colab notebook that demonstrates the issue here

How did I fix the issue

The issue arises when input_ids contain padded tokens; the hidden_states of the model then include embeddings for these padded tokens if the model doesn't eliminate them during processing. The _ranking_fast function also calculates values based on these padded tokens, leading to incorrect outputs. This is critical because it significantly degrades the model's performance.

To fix this, I created a cosine_matrix_mask based on the attention_mask and penalized the cosine_matrix using this mask (ignoring padding positions) by applying large negative values. After that, I padded the cosine_matrix_mask with ones to match the output length.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@gante @ArthurZucker @amyeroberts @Rocketknight1

@ducviet00 ducviet00 force-pushed the fix-contrastive-search-inconsistency branch from 133f680 to f311827 Compare September 16, 2024 13:01
@ducviet00 ducviet00 force-pushed the fix-contrastive-search-inconsistency branch 4 times, most recently from b0f508d to f0add62 Compare September 17, 2024 07:49
Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ducviet00 thank you for identifying the underlying numerical issue, proposing a very reasonable fix, and implementing it with a test 💛

I agree with the suggested fix: by masking cosine_matrix with a large negative value when the corresponding tokens are masked, degeneration_penalty will never be related to the masked tokens and, therefore, masked tokens will not have an impact on contrastive_score.

I've added a few minor suggestions

@ducviet00
Copy link
Contributor Author

@gante Thanks for your feedback! I initially thought encoder-decoder models wouldn’t be affected since decoder_input_ids don’t seem to need padding tokens, except in continuous batching (not in this case). Applying the cosine_matrix_mask to these models might add some minor overhead, though. I hadn’t considered the decoder prompt before, so thanks for pointing that out!

I’ll push an update with support for encoder-decoder models based on your suggestions soon.

@ducviet00
Copy link
Contributor Author

@gante
I push an update to support encode-decoder models. I also added test for t5 model, with padding for decoder_input_ids. On the main branch, outputs_with_padding:

  • Ich muss diese Aufgabe noch vor Ende des Tages beenden.

It should be:

  • Ich muss diese Aufgabe vor Ende des Tages beenden.

You can check the code inside test_padding_input_contrastive_search_t5, in tests/generation/test_utils.py

Comment on lines +2600 to +2608
# Create cosine_matrix_mask based on the attention_mask
cosine_matrix_mask = torch.ones_like(input_ids, dtype=torch.long)
if self.config.is_encoder_decoder:
if "decoder_attention_mask" in model_kwargs and model_kwargs["decoder_attention_mask"] is not None:
cosine_matrix_mask = model_kwargs["decoder_attention_mask"]
else:
cosine_matrix_mask = model_kwargs["attention_mask"]
cosine_matrix_mask = cosine_matrix_mask.repeat_interleave(top_k, dim=0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code initializes a default mask and then updates it based on the model type.

  • For encoder-decoder models, if model_kwargs contains a decoder_attention_mask (and it is not None), cosine_matrix_mask is set to this mask. If decoder_attention_mask is missing, it falls back to the default mask.
  • For decoder-only models, cosine_matrix_mask is set to the attention_mask from model_kwargs.

Please let me know if there are any additional logic checks that need to be added

@ducviet00 ducviet00 requested a review from gante September 17, 2024 20:22
@ducviet00 ducviet00 force-pushed the fix-contrastive-search-inconsistency branch from cde4634 to 6a86c0d Compare September 17, 2024 20:25
@ducviet00 ducviet00 force-pushed the fix-contrastive-search-inconsistency branch from 955fdfb to b8a08dd Compare September 18, 2024 03:18
@ducviet00
Copy link
Contributor Author

Hi @gante
PTAL 🤗

@ducviet00 ducviet00 changed the title Fix contrastive search to correctly handle input padding in decoder-only models Fix contrastive search to correctly handle input padding Sep 19, 2024
@ducviet00 ducviet00 changed the title Fix contrastive search to correctly handle input padding Fix contrastive search to correctly handle input with padding Sep 19, 2024
Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating 💛

@gante gante requested a review from LysandreJik September 19, 2024 18:07
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive change and tests! Thanks for your PR @ducviet00

@gante gante merged commit dc8b6ea into huggingface:main Sep 20, 2024
20 checks passed
@gante
Copy link
Member

gante commented Sep 20, 2024

@ducviet00 thank you for making transformers better for everyone 💛

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
…gface#33507)

* fix: handle padding in contrastive search for decoder-only models

* fix: handle padding in contrastive search for encoder-decoder models

* tests: move padding contrastive test to test_util, add t5 test

* fix: handle if model_kwargs["decoder_attention_mask"] is None

* refactor: improve padding input contrastive search generation tests

* chore: _ranking_fast to use LongTensor for cosine_matrix_mask
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants