🚨🚨 Fix beam score calculation issue for decoder-only models #27351

VsonicV · 2023-11-07T17:32:42Z

What does this PR do?

This PR fixes issue #26624 . In the original implementation of beam search, the beam score for decoder-only models is normalized by the total length of both prompt and generated sequence. However, the length of prompt should not be included in the normalization step. This issue would cause an unexpected bias towards generating shorter sequences.

This is a simple quick fix by adding an optional parameter decoder_prompy_len, which stores the length of prompt in decoder, to BeamSearchScorer.process(), BeamSearchScorer.finalize() and BeamHypotheses.add(). Since the added new parameter is optional with a default value as 0, any existing calls of these functions without specifying decoder_prompy_len would still work in the same way as before, avoiding any unexpected incompatibility. The corner case in which the very first generated token happens to be eos_token (empty generation) is considered and handled.

Fixes #26624

Note: There are three follow-up PRs that complement this fix:

Fix remaining issues in beam score calculation #27808 further fixes some remaining issues in the Pytorch version.
Fix beam score calculation issue for Tensorflow version #27814 fixes the Tensorflow version.
Fix beam score calculation issue for JAX version #27816 fixes the JAX version.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@gante

VsonicV · 2023-11-07T17:42:04Z

@gante This commit is only for fixing beam_search. If you think the fix is good to go, I can also apply the same fix to beam_sample, group_beam_search and constrained_beam_search.

VsonicV · 2023-11-07T18:01:32Z

@gante I think the current tests regarding beam_search are using the results generated by previous "buggy" version, so the new beam_search cannot pass the test test_eos_token_id_int_and_list_beam_search, which uses the decoder-only GPT-2. We need to update the relevant tests as well.

gante

LGTM, thank you for jumping in to fix it!

Regarding the GPT-2 tests: agreed, they should be updated.

gante · 2023-11-07T18:36:48Z

@VsonicV the "setup and quality" CI can be fixed by running make fixup on your local transformers root folder and committing the changes!

VsonicV · 2023-11-08T04:18:54Z

@gante Thanks for the suggestion! I have fixed the code quality issue using make fixup, and updated the relevant test test_eos_token_id_int_and_list_beam_search with new expectation value. Both checks pass now. However, there is still one check failure caused by test_run_image_classification_no_trainer and test_run_ner_no_trainer, which should be irrelevant to these commits regarding beam search. Do you have any clue about how to fix it?

gante · 2023-11-08T12:01:39Z

@VsonicV perfect! The failing CI is indeed unrelated (fix: #27353), the tests should pass after it gets merged.

To keep the consistency of this fix throughout beam methods, I'd like to request you to:

Also apply this change to other beam methods in Pytorch :)
Add 🚨🚨 to the PR title, as this is a (correct but also) breaking change
(optional, only if you're comfortable with it, as the fix is slightly different) Apply this change to beam methods in TF and JAX

After 1. and 2. is done, I'll tag a core maintainer to greenlight the merge!

VsonicV · 2023-11-08T14:09:00Z

@gante Sure! I will work on 1 and 2 in the next 2 days. Will try to do 3 after that.

gante · 2023-11-08T14:24:30Z

The PR causing the CI to fail was merged, and I was informed that current PRs will need to be rebased to pass CI 🤗

VsonicV · 2023-11-09T08:20:47Z

@gante Item 1 and 2 are done! I have applied the fix to all beam related methods: beam_sample, group_beam_search and constrained_beam_search. I have rebased the PR and all relevant tests have passed.

Regarding the remaining check failures, the recent merge only fixes the check failure caused by test_run_image_classification_no_trainer, but not for test_run_ner_no_trainer. According to the error message AssertionError: 0.5109489440917969 not less than 0.5, the checking threshold for self.assertLess(result["train_loss"], 0.5) in test_run_ner_no_trainer needs to be adjusted as well. Moreover, one new check failure is caused by test_cached_model_has_minimum_calls_to_head and test_cached_tokenizer_has_minimum_calls_to_head, which are unrelated to the commits in this PR (we only see this after the most recent rebase).

gante · 2023-11-09T16:52:29Z

@VsonicV yes, we are still having some CI failures (unrelated to this PR) 😭

VsonicV · 2023-11-12T03:59:11Z

@gante Tried rebasing once more, all the previous check failures are gone, but got one new CI failure caused by test_assisted_decoding_sample, which should again be unrelated to this PR.

VsonicV · 2023-11-15T08:26:31Z

@ArthurZucker I have rebased this PR with all your recently added test skips, the CI failures caused by test_assisted_decoding_sample still persist for blenderbot, same failures also happened for pegasus and umt5 in my previous tries, would you mind adding skips of test_assisted_decoding_sample for blenderbot, pegasus and umt5 as well? Thank you!

ArthurZucker · 2023-11-15T08:44:08Z

Yeah I'll skip this test for everyone this is getting annoying! 😅 #27511 was merged

gante · 2023-11-15T11:28:21Z

tests/generation/test_framework_agnostic.py

@@ -633,7 +633,7 @@ def test_eos_token_id_int_and_list_beam_search(self):
            "do_sample": False,
            "num_beams": 3,
        }
-        expectation = 13
+        expectation = 20


Suggested change

expectation = 20

if is_pt:

expectation = 20

else:

# TODO (joao): fix me

expectation = 13

This test will likely fail on TF, since we haven't applied this upgrade there. Let's add a TODO for now

gante · 2023-11-15T11:28:56Z

Tagging @amyeroberts for a final check

amyeroberts

Thanks for adding!

amyeroberts · 2023-11-15T11:57:10Z

src/transformers/generation/beam_search.py

+        cur_len = (
+            input_ids.shape[-1] - decoder_prompt_len + 1
+        )  # add up to the length which the next_scores is calculated on


nit

Suggested change

cur_len = (

input_ids.shape[-1] - decoder_prompt_len + 1

) # add up to the length which the next_scores is calculated on

# add up to the length which the next_scores is calculated on

cur_len = input_ids.shape[-1] - decoder_prompt_len + 1

amyeroberts · 2023-11-15T11:57:55Z

src/transformers/generation/beam_search.py

+        cur_len = (
+            input_ids.shape[-1] - decoder_prompt_len + 1
+        )  # add up to the length which the next_scores is calculated on


nit

Suggested change

cur_len = (

input_ids.shape[-1] - decoder_prompt_len + 1

) # add up to the length which the next_scores is calculated on

# add up to the length which the next_scores is calculated on

cur_len = input_ids.shape[-1] - decoder_prompt_len + 1

amyeroberts · 2023-11-15T11:59:44Z

src/transformers/generation/beam_search.py

@@ -511,6 +520,7 @@ def process(
        pad_token_id: Optional[int] = None,
        eos_token_id: Optional[Union[int, List[int]]] = None,
        beam_indices: Optional[torch.LongTensor] = None,
+        decoder_prompt_len: Optional[int] = 0,


Can you add this arg to the docstring below?

good catch, will do

HuggingFaceDocBuilderDev · 2023-11-15T12:22:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

VsonicV · 2023-11-15T12:44:48Z

@gante @amyeroberts All your suggested changes have been added and committed. All the tests have passed now (finally!). Should be ready for merge.

gante · 2023-11-15T12:50:57Z

@VsonicV Thank you for iterating with us and making transformers better 💛

And sorry for all the failing CI, you've caught an unfortunate series of failures 😬

VsonicV · 2023-11-15T13:02:11Z

@gante No problem! Regarding the fix of TF and JAX version, I have looked at the relevant codes briefly, and I think I can fix them. I will try to submit another PR fixing both TF and JAX later this week.

…ace#27351) * Fix beam score calculation issue for decoder-only models * Update beam search test and fix code quality issue * Fix beam_sample, group_beam_search and constrained_beam_search * Split test for pytorch and TF, add documentation --------- Co-authored-by: Xin Qiu <xin.qiu@sentient.ai>

VsonicV · 2023-12-01T15:04:18Z

@gante No problem! Regarding the fix of TF and JAX version, I have looked at the relevant codes briefly, and I think I can fix them. I will try to submit another PR fixing both TF and JAX later this week.

@gante Sorry about the delay in the next steps. I had a severe flu last week and just recovered. Will start working on the remaining fixes.

VsonicV · 2023-12-04T09:49:48Z

@gante @amyeroberts All follow-up tasks have been completed, in three new PRs:

Fix remaining issues in beam score calculation #27808 further fixes some remaining issues in the Pytorch version.
Fix beam score calculation issue for Tensorflow version #27814 fixes the Tensorflow version.
Fix beam score calculation issue for JAX version #27816 fixes the JAX version.

All three PRs have passed the CI checks. Ready for your review @gante .

VsonicV · 2023-12-14T02:54:48Z

@gante Hi, I noticed that in the recent release notes of v4.36.0, only this PR is listed in "Beam score calculation for decoder-only models" section under "Breaking changes". Should we also add the 3 follow-up PRs ( #27808 #27814 #27816 ) under that section? It would be more clear for people to check all the changes relevant to this breaking change. Thanks.

gante · 2024-01-09T18:05:02Z

@VsonicV updated the release notes for future reference 👍 Thank you for your suggestion

VsonicV mentioned this pull request Nov 7, 2023

Beam search calculates mean logprobs wrong? #26624

Closed

4 tasks

gante approved these changes Nov 7, 2023

View reviewed changes

VsonicV force-pushed the fix_beam_score branch from 6d8bde6 to 2ca1e4e Compare November 8, 2023 03:31

VsonicV changed the title ~~Fix beam score calculation issue for decoder-only models~~ 🚨🚨Fix beam score calculation issue for decoder-only models Nov 8, 2023

VsonicV changed the title ~~🚨🚨Fix beam score calculation issue for decoder-only models~~ 🚨🚨 Fix beam score calculation issue for decoder-only models Nov 8, 2023

VsonicV force-pushed the fix_beam_score branch from 2ca1e4e to c6e48a7 Compare November 9, 2023 07:52

VsonicV force-pushed the fix_beam_score branch from c6e48a7 to 464624c Compare November 11, 2023 13:53

VsonicV force-pushed the fix_beam_score branch 2 times, most recently from cd126fe to 32c8293 Compare November 14, 2023 13:01

This was referenced Nov 14, 2023

[CI-test_torch] skip test_tf_from_pt_safetensors for 4 models #27481

Merged

add attention_mask and position_ids in assisted model #26892

Merged

Support ONNX export for causal LM sequence classifiers #27450

Merged

VsonicV force-pushed the fix_beam_score branch from 32c8293 to d0b64ad Compare November 15, 2023 08:01

VsonicV added 3 commits November 15, 2023 09:51

Fix beam score calculation issue for decoder-only models

2dada48

Update beam search test and fix code quality issue

a71ddf6

Fix beam_sample, group_beam_search and constrained_beam_search

a4a6da9

VsonicV force-pushed the fix_beam_score branch from d0b64ad to a4a6da9 Compare November 15, 2023 09:51

gante reviewed Nov 15, 2023

View reviewed changes

gante requested a review from amyeroberts November 15, 2023 11:28

amyeroberts approved these changes Nov 15, 2023

View reviewed changes

Split test for pytorch and TF, add documentation

8f1f79f

gante merged commit 453079c into huggingface:main Nov 15, 2023
2 checks passed

VsonicV deleted the fix_beam_score branch November 15, 2023 13:06

gante mentioned this pull request Nov 17, 2023

Generate: update compute transition scores doctest #27558

Merged

This was referenced Dec 3, 2023

Fix remaining issues in beam score calculation #27808

Merged

Fix beam score calculation issue for Tensorflow #27812

Closed

Fix beam score calculation issue for Tensorflow version #27814

Merged

Fix beam score calculation issue for JAX version #27816

Merged

gante mentioned this pull request Dec 5, 2023

Generate: Update VisionEncoderDecoder test value #27850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨🚨 Fix beam score calculation issue for decoder-only models #27351

🚨🚨 Fix beam score calculation issue for decoder-only models #27351

VsonicV commented Nov 7, 2023 •

edited

Loading

VsonicV commented Nov 7, 2023 •

edited

Loading

VsonicV commented Nov 7, 2023

gante left a comment

gante commented Nov 7, 2023

VsonicV commented Nov 8, 2023

gante commented Nov 8, 2023 •

edited

Loading

VsonicV commented Nov 8, 2023

gante commented Nov 8, 2023

VsonicV commented Nov 9, 2023

gante commented Nov 9, 2023

VsonicV commented Nov 12, 2023

VsonicV commented Nov 15, 2023

ArthurZucker commented Nov 15, 2023 •

edited

Loading

gante Nov 15, 2023

VsonicV Nov 15, 2023

gante commented Nov 15, 2023

amyeroberts left a comment

amyeroberts Nov 15, 2023

amyeroberts Nov 15, 2023

amyeroberts Nov 15, 2023

VsonicV Nov 15, 2023

HuggingFaceDocBuilderDev commented Nov 15, 2023

VsonicV commented Nov 15, 2023

gante commented Nov 15, 2023

VsonicV commented Nov 15, 2023

VsonicV commented Dec 1, 2023

VsonicV commented Dec 4, 2023

VsonicV commented Dec 14, 2023

gante commented Jan 9, 2024

🚨🚨 Fix beam score calculation issue for decoder-only models #27351

🚨🚨 Fix beam score calculation issue for decoder-only models #27351

Conversation

VsonicV commented Nov 7, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

VsonicV commented Nov 7, 2023 • edited Loading

VsonicV commented Nov 7, 2023

gante left a comment

Choose a reason for hiding this comment

gante commented Nov 7, 2023

VsonicV commented Nov 8, 2023

gante commented Nov 8, 2023 • edited Loading

VsonicV commented Nov 8, 2023

gante commented Nov 8, 2023

VsonicV commented Nov 9, 2023

gante commented Nov 9, 2023

VsonicV commented Nov 12, 2023

VsonicV commented Nov 15, 2023

ArthurZucker commented Nov 15, 2023 • edited Loading

gante Nov 15, 2023

Choose a reason for hiding this comment

VsonicV Nov 15, 2023

Choose a reason for hiding this comment

gante commented Nov 15, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Nov 15, 2023

Choose a reason for hiding this comment

amyeroberts Nov 15, 2023

Choose a reason for hiding this comment

amyeroberts Nov 15, 2023

Choose a reason for hiding this comment

VsonicV Nov 15, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 15, 2023

VsonicV commented Nov 15, 2023

gante commented Nov 15, 2023

VsonicV commented Nov 15, 2023

VsonicV commented Dec 1, 2023

VsonicV commented Dec 4, 2023

VsonicV commented Dec 14, 2023

gante commented Jan 9, 2024

VsonicV commented Nov 7, 2023 •

edited

Loading

VsonicV commented Nov 7, 2023 •

edited

Loading

gante commented Nov 8, 2023 •

edited

Loading

ArthurZucker commented Nov 15, 2023 •

edited

Loading