Generate: improve assisted generation tests #27540

gante · 2023-11-16T14:52:10Z

What does this PR do?

Strengthens the test suite for assisted generation. With these modifications, previously found API problems will be properly caught in advance.

Post mortem

Why weren't API problems caught before?

Assisted generation has two loops: the loop to obtain the candidate tokens from the assistant model (inner loop), and the loop to generate the final tokens from the main model (outer loop). Both loops are slightly different depending on whether the main model accepts the matches or not -- there are different code paths depending on whether n_matches > 0 or not.

The following cases were being tested and had no API issues:

n_matches == 0
n_matches > 0, but we only run 1 iteration of the outer loop

👉 We weren't explicitly testing the case where n_matches > 0 AND we ran more than 1 outer loop iteration.

If we weren't testing that case, why was the CI randomly red?

Each individual test had a ~97% chance of being green. The (random) assistant model was building the candidate sequence from the most likely tokens from its vocabulary (size = 99), and the main model was comparing the candidate sequence against sampling from its logits. Most of the times, n_matches == 0, so the test passed. However, sometimes we had n_matches > 0, but not to the point where it was enough to complete assisted generation in 1 outer loop.

👉 There was a low chance (per test) of hitting the failing case, resulting in inconsistent CI failures

gante · 2023-11-16T14:56:04Z

tests/generation/test_utils.py

@@ -1524,62 +1529,49 @@ def test_assisted_decoding_matches_greedy_search(self):
            ):
                self.skipTest("May fix in the future: need model-specific fixes")

-            # This for loop is a naive and temporary effort to make the test less flaky.
-            failed = 0
-            for i in range(10):


This was essentially the same as @is_flaky, but (IMO) less elegant.

Now that we understand the cause for the mismatch (matmul with different shapes), and know that there is no workaround, it is safe to confirm that this test is indeed flaky :)

gante · 2023-11-16T15:11:20Z

tests/generation/test_utils.py

@@ -1520,66 +1525,53 @@ def test_assisted_decoding_matches_greedy_search(self):
                self.skipTest("Won't fix: old model with different cache format")
            if any(
                model_name in model_class.__name__.lower()
-                for model_name in ["bigbirdpegasus", "led", "mega", "speech2text", "git", "prophetnet"]
+                for model_name in ["bigbirdpegasus", "led", "mega", "speech2text", "git", "prophetnet", "seamlessm4t"]


Note: seamlessm4t was already in the skip list of test_assisted_decoding_sample, probably for the same post mortem reasons

amyeroberts

Thanks for adding!

Great comments to provide context in the tests 🙏 Only comment is about having config.is_decoder set for all these tests. Is the case when config.is_encoder_decoder fully covered?

amyeroberts · 2023-11-16T15:40:04Z

tests/generation/test_utils.py

@@ -1599,18 +1609,27 @@ def test_assisted_decoding_sample(self):
            config.use_cache = True
            config.is_decoder = True


Can we also have a test for when config.is_encoder_decoder to make sure any relevant logic is handled there?

gante · 2023-11-16T18:07:34Z

@amyeroberts is_decoder is a poorly named flag 😅 contrarily to is_encoder_decoder, which controls many aspects in generation, is_decoder only controls one thing AFAIK -- whether to enable use_cache (example) and pipe the cache around in encoders with a LM Head.

It is also not mutually exclusive with is_encoder_decoder (it should be IMO 👀)

All tests that require caching, such as the assisted generation ones, have to set model.config.is_decoder = True. Otherwise, the tests will fail in the encoder with LM Heads (see image below)

amyeroberts · 2023-11-16T18:16:15Z

@gante Thanks for explaining! I thought they were mutually exclusive

amyeroberts

Thank you 🙏

Always run the cases with and without assistant matches

85445dd

gante requested a review from amyeroberts November 16, 2023 14:52

gante commented Nov 16, 2023

View reviewed changes

gante mentioned this pull request Nov 16, 2023

fix assisted decoding assistant model inputs #27503

Merged

skip seamlessm4t

b0e303c

gante commented Nov 16, 2023

View reviewed changes

skip clvp

060f485

amyeroberts reviewed Nov 16, 2023

View reviewed changes

amyeroberts approved these changes Nov 16, 2023

View reviewed changes

gante merged commit 12b50c6 into huggingface:main Nov 16, 2023
3 checks passed

gante deleted the assisted_harder_tests branch November 16, 2023 18:54

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023

Generate: improve assisted generation tests (huggingface#27540)

8b0500d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: improve assisted generation tests #27540

Generate: improve assisted generation tests #27540

gante commented Nov 16, 2023

gante Nov 16, 2023 •

edited

Loading

gante Nov 16, 2023

amyeroberts left a comment

amyeroberts Nov 16, 2023

gante commented Nov 16, 2023

amyeroberts commented Nov 16, 2023

amyeroberts left a comment

		@@ -1599,18 +1609,27 @@ def test_assisted_decoding_sample(self):
		config.use_cache = True
		config.is_decoder = True

Generate: improve assisted generation tests #27540

Generate: improve assisted generation tests #27540

Conversation

gante commented Nov 16, 2023

What does this PR do?

Post mortem

Why weren't API problems caught before?

If we weren't testing that case, why was the CI randomly red?

gante Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

gante Nov 16, 2023

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Nov 16, 2023

Choose a reason for hiding this comment

gante commented Nov 16, 2023

amyeroberts commented Nov 16, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

gante Nov 16, 2023 •

edited

Loading