Fix remaining issues in beam score calculation #27808

VsonicV · 2023-12-03T04:48:34Z

What does this PR do?

This PR further fixes the remaining issues in beam score calculation following #27351 .
More specifically:

When adding new hypothesis, the hyp in process does not include the next generated token on which the current beam score is calculated, but the hyp in finalize includes all the generated tokens so far. This inconsistency is resolved by changing the add function of BeamHypotheses. Now we directly pass the current length of the generated tokens to add.
When calculating best possible beam score in is_done function of BeamHypotheses, we are directly using max_length without deducting decoder_prompt_len. It is fixed now.
Updated the testing expectation accordingly.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@gante

gante

Good catch, thanks for fixing!

BTW, run RUN_SLOW=1 py.test tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::ViT2GPT2ModelIntegrationTest::test_inference_coco_en -- this test may need an update in its value

gante · 2023-12-05T11:09:51Z

src/transformers/generation/beam_search.py

-                    if decoder_prompt_len == input_ids.shape[-1]:
-                        continue
+                    # add up to the length which the next_scores is calculated on
+                    generated_len = input_ids[batch_beam_idx].shape[-1] + 1 - decoder_prompt_len


If I'm not mistaken, this is the same as cur_len (L228). I'd suggest renaming cur_len into generated_len, which is more representative of the variable contents!

Yes, the input_ids[batch_beam_idx].shape[-1] should be same for each batch_beam_idx, so we can simply use cur_len here.

gante · 2023-12-05T11:19:54Z

src/transformers/generation/beam_search.py

    ):
        """
        Add a new hypothesis to the list.
        """
-        score = sum_logprobs / ((hyp.shape[-1] - decoder_prompt_len) ** self.length_penalty)
+        if generated_len is not None:


I'd add a note that the else case here exists for retrocompatibility reasons :)

Thanks for the reminder. Added!

VsonicV · 2023-12-06T03:00:29Z

Good catch, thanks for fixing!

BTW, run RUN_SLOW=1 py.test tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::ViT2GPT2ModelIntegrationTest::test_inference_coco_en -- this test may need an update in its value

@gante I have updated the expectation value for this test. I have also incorporated all your suggestions. Ready to go!

VsonicV · 2023-12-06T03:48:50Z

@gante I have also updated the usage of cur_len in this Pytorch version following your suggestions in #27814 , now it represents the length of the entire sequence including the decoder prompt, which is consistent with the remaining codebase.

gante

Perfect, thanks for iterating 💛

Note for @ArthurZucker: there may be slow CI failures due to this change. Although I suspect there won't, since the correction is small. In any case, I'll keep an eye on the CI after this gets merged.

gante · 2023-12-07T15:27:20Z

tests/generation/test_framework_agnostic.py

@@ -634,7 +634,7 @@ def test_eos_token_id_int_and_list_beam_search(self):
            "num_beams": 3,
        }
        if is_pt:


nit: we can actually remove this if/else, if the result is back into being the same :P

redundant if/else removed

…he decoder prompt

VsonicV · 2023-12-08T01:07:36Z

All suggested changes are incorporated. Ready to go! @gante @ArthurZucker

ArthurZucker

thanks for this! 🤗

ArthurZucker · 2023-12-08T13:12:59Z

src/transformers/generation/beam_search.py

-        score = sum_logprobs / ((hyp.shape[-1] - decoder_prompt_len) ** self.length_penalty)
+        if generated_len is not None:
+            score = sum_logprobs / (generated_len**self.length_penalty)
+        # This 'else' case exists for retrocompatibility


Suggested change

# This 'else' case exists for retrocompatibility

# This 'else' case exists for backward compatibility

oops, PR already merged, maybe let's stay with it for now?

of course no worries

ArthurZucker · 2023-12-08T13:13:51Z

tests/generation/test_framework_agnostic.py

-        if is_pt:
-            expectation = 20
-        else:
-            # TODO (joao): fix me
-            expectation = 13


VsonicV force-pushed the beamhyp_fix branch from cd8ec99 to 05e76d8 Compare December 4, 2023 09:01

VsonicV mentioned this pull request Dec 4, 2023

🚨🚨 Fix beam score calculation issue for decoder-only models #27351

Merged

5 tasks

gante approved these changes Dec 5, 2023

View reviewed changes

gante requested a review from ArthurZucker December 5, 2023 11:21

VsonicV force-pushed the beamhyp_fix branch from 1871349 to 6f56c2c Compare December 6, 2023 02:57

gante approved these changes Dec 7, 2023

View reviewed changes

VsonicV added 6 commits December 8, 2023 00:43

Fix issues in add and is_done for BeamHypotheses

6d4a9f9

make newly added arguments optional for better compatibility

240d1fe

Directly use cur_len as generated_len, add note for retrocompatibility

35887f9

update test expectation

96bacb3

make cur_len represents the length of the entire sequence including t…

b9b97e9

…he decoder prompt

remove redundant if/else in testing

890ba95

VsonicV force-pushed the beamhyp_fix branch from 2ea92d9 to 890ba95 Compare December 8, 2023 00:43

ArthurZucker approved these changes Dec 8, 2023

View reviewed changes

ArthurZucker merged commit b31905d into huggingface:main Dec 8, 2023
2 checks passed

VsonicV deleted the beamhyp_fix branch December 8, 2023 13:34

ydshieh mentioned this pull request May 28, 2024

TST: Fix instruct-blip tests #31088

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix remaining issues in beam score calculation #27808

Fix remaining issues in beam score calculation #27808

VsonicV commented Dec 3, 2023

gante left a comment

gante Dec 5, 2023

VsonicV Dec 6, 2023

gante Dec 5, 2023

VsonicV Dec 6, 2023

VsonicV commented Dec 6, 2023

VsonicV commented Dec 6, 2023

gante left a comment

gante Dec 7, 2023

VsonicV Dec 8, 2023

VsonicV commented Dec 8, 2023

ArthurZucker left a comment

ArthurZucker Dec 8, 2023

VsonicV Dec 8, 2023

ArthurZucker Dec 9, 2023

ArthurZucker Dec 8, 2023

	# This 'else' case exists for retrocompatibility
	# This 'else' case exists for backward compatibility

Fix remaining issues in beam score calculation #27808

Fix remaining issues in beam score calculation #27808

Conversation

VsonicV commented Dec 3, 2023

What does this PR do?

Before submitting

Who can review?

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VsonicV commented Dec 6, 2023

VsonicV commented Dec 6, 2023

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VsonicV commented Dec 8, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment