Fix length related warnings in speculative decoding #29585

zucchini-nlp · 2024-03-11T10:34:29Z

What does this PR do?

Currently if we pass a min_length or min_new_tokens to speculative decoding, we get a bunch of warnings that
UserWarning: Unfeasible length constraints: min_new_tokens (34), when added to the prompt length (66), is larger than the maximum possible length (75)....

This PR adds a min_new_tokens argument in candidate's generate, which will default to 0 if no min_length as passed by the user.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@gante

fixes #29860

HuggingFaceDocBuilderDev · 2024-03-11T10:57:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

Thank you for fixing! 👍

src/transformers/generation/candidate_generator.py

gante · 2024-03-11T15:50:35Z

src/transformers/generation/utils.py

@@ -1501,7 +1501,7 @@ def generate(
            )

            # 12. run assisted generate
-            result = self.assisted_decoding(
+            result = self._assisted_decoding(


good catch, this was surely throwing a deprecation warning 👍

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

amyeroberts

Thanks for working on this!

A few questions about the intended behaviour - in particular why the config values are forcibly reset and instead a instance attribute is used.

Some tests to make sure min_new_tokens has intended behaviour

amyeroberts · 2024-03-11T17:31:31Z

tests/generation/test_utils.py

@@ -3252,6 +3252,28 @@ def test_default_max_length_warning(self):
            model.generate(input_ids)
            self.assertEqual(len(warning_list), 0)

+    def test_length_warning_assisted_generation(self):


There should also be a test that min_new_tokens parameter behaves as expected, especially when max_new_tokens is also set and when it's not set at all i.e. goes to default value

Added one more test checking if the length is in range between min and max lengths

amyeroberts · 2024-03-11T17:32:31Z

src/transformers/generation/candidate_generator.py

@@ -157,6 +157,13 @@ def __init__(
        self.generation_config.return_dict_in_generate = True
        self.generation_config.output_scores = True

+        # avoid unnecessary warnings that min_length is larger than max_new_tokens
+        input_length = input_ids.shape[-1]
+        min_new_tokens = self.generation_config.min_new_tokens if self.generation_config.min_new_tokens else 0


Are we guaranteed the self.generation_config has this attribute?

Yes, the GenerationConfig usually initiates those to 0 or None, if not indicated by the user. So, we check if it is not None for min_new_tokens and then set min_length to maximum (code says min, i'll fix it) between user-defined value or the default 0.

@amyeroberts AFAIK only Whisper (and perhaps other audio models?) uses attributes that may not exist in a generation_config, it is a fairly regular object with everything initialized in __init__ :D

OK, and can we ever expect to have whisper generate configs being used here, or does the model always just use it's own custom generation code?

amyeroberts · 2024-03-11T17:33:19Z

src/transformers/generation/candidate_generator.py

@@ -157,6 +157,13 @@ def __init__(
        self.generation_config.return_dict_in_generate = True
        self.generation_config.output_scores = True

+        # avoid unnecessary warnings that min_length is larger than max_new_tokens
+        input_length = input_ids.shape[-1]
+        min_new_tokens = self.generation_config.min_new_tokens if self.generation_config.min_new_tokens else 0


Is this checking for None-ness? Or can it be False? Otherwise, defaulting to 0 if it's 0 is a superfluous check

For Noneness, I'll specify it as not None

amyeroberts · 2024-03-11T17:36:16Z

src/transformers/generation/candidate_generator.py

@@ -175,6 +182,7 @@ def get_candidates(self, input_ids: torch.LongTensor) -> Tuple[torch.LongTensor,
        # Don't generate more than `max_length - 1` candidates since the target model generates one extra token.
        new_cur_len = input_ids.shape[-1]
        max_new_tokens = min(int(self.num_assistant_tokens), self.generation_config.max_length - new_cur_len - 1)
+        min_new_tokens = min(max_new_tokens, self.min_length - new_cur_len)


Won't this result in negative values?

If I've generated more tokens than self.min_length i.e. new_cur_len > self.min_length

good point, I've missed it

amyeroberts · 2024-03-11T17:36:48Z

src/transformers/generation/candidate_generator.py

@@ -175,6 +182,7 @@ def get_candidates(self, input_ids: torch.LongTensor) -> Tuple[torch.LongTensor,
        # Don't generate more than `max_length - 1` candidates since the target model generates one extra token.
        new_cur_len = input_ids.shape[-1]
        max_new_tokens = min(int(self.num_assistant_tokens), self.generation_config.max_length - new_cur_len - 1)
+        min_new_tokens = min(max_new_tokens, self.min_length - new_cur_len)


Why use a class attribute and not the generation config, as for max_length

Oh, this is because for maximum length we deprecated generation_config.max_new_tokens, so we can use the only possible attribute for max length. Yet, for minimum length we have to attributes, both of which are equally valid. That's why in init we manually set the min_length by checking both attributes, if they are set by user.

EDIT: I just remembered why I did that wat. We have to set generation_config's min_length to 0 in init, that's required to avoid unnecessary warnings. That's why I saved it as class attribute. Otherwise, the generation woul receive kwargs like below and throw warnings
{"min_length"=20, "min_new_tokens"=5, "max_new_tokens=5}

@gante , btw, don't you think we can also deprecate min_new_token?

@zucchini-nlp it's the other way around, if anything we would want to deprecate the min_tokens argument/config option :) max_new_tokens and min_new_tokens are much more predictable from a user point of view, as the user doesn't need to be concerned with the input length. In the past, before max_new_tokens and min_new_tokens were introduced, we would often get issues from confused users.

Inside generate, however, it is much easier to use the total length for control (no need to track the input length). We set max_length from max_new_tokens when needed (here), perhaps we should do the same with min_length to simplify the procedure in this PR :)

I decided to make a separate method for all length related corrections. Added one more test for min length, same as we have for max length.

src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

gante

The recent changes look good to me. Added a nit :)

src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

amyeroberts · 2024-03-13T14:50:24Z

@zucchini-nlp It looks like a lot of tests are failing at the moment because of the lack of min_length attribute

zucchini-nlp · 2024-03-13T15:26:24Z

Oops, I accepted the last suggestion and did not fix naming in other places of the code. Now it should work, at least locally it was passing tests in "generation"

zucchini-nlp · 2024-03-20T16:04:40Z

@amyeroberts this one is ready to re-review 😃

amyeroberts

Thanks for all the work iterating on this and improving our warnings!

zucchini-nlp · 2024-04-08T08:18:35Z

@amyeroberts can you merge pls? Failing tests if TF seem to be unrelated

amyeroberts · 2024-04-08T13:58:02Z

As the failing tests are for generation, there's a very small chance there would be some interaction between the changes here and those tests (mainly because the test implementation isn't TF specific).

They should now be resolved on main and resolve with a quick rebase :)

zucchini-nlp added 2 commits March 11, 2024 11:14

avoid generation length warning

7859c43

add tests

4b41249

gante approved these changes Mar 11, 2024

View reviewed changes

gante requested a review from amyeroberts March 11, 2024 15:51

Update src/transformers/generation/candidate_generator.py

7c35d05

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

amyeroberts reviewed Mar 11, 2024

View reviewed changes

zucchini-nlp added 2 commits March 11, 2024 20:52

add tests and minor fixes

0cb8c1d

refine min_new_tokens

f2b820e

gante reviewed Mar 12, 2024

View reviewed changes

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved

zucchini-nlp and others added 3 commits March 13, 2024 14:18

Update src/transformers/generation/candidate_generator.py

dc8235c

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

add method to prepare length arguments

2f48285

add test for min length

cc8a47c

gante reviewed Mar 13, 2024

View reviewed changes

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved

Update src/transformers/generation/candidate_generator.py

0683f2d

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

zucchini-nlp requested a review from amyeroberts March 13, 2024 14:26

fix variable naming

24ed5d8

zucchini-nlp mentioned this pull request Mar 15, 2024

Prepend bos token to Blip generations #29642

Merged

zucchini-nlp mentioned this pull request Mar 26, 2024

assisted_decoding called directly inside generate triggering warning to use when it shouldn't #29860

Closed

4 tasks

amyeroberts approved these changes Apr 4, 2024

View reviewed changes

zucchini-nlp added 2 commits April 4, 2024 16:38

Merge remote-tracking branch 'upstream/main' into speculative_decoding

31ad6c0

Merge remote-tracking branch 'upstream/main' into speculative_decoding

2a2ec3d

Merge remote-tracking branch 'upstream/main' into speculative_decoding

081e1b9

zucchini-nlp added 2 commits April 8, 2024 16:06

empty commit for tests

08ff6a9

trigger tests (empty)

bb14b36

Merge branch 'huggingface:main' into speculative_decoding

3b628e0

zucchini-nlp merged commit 4157976 into huggingface:main Apr 10, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix length related warnings in speculative decoding #29585

Fix length related warnings in speculative decoding #29585

zucchini-nlp commented Mar 11, 2024 •

edited by ArthurZucker

Loading

HuggingFaceDocBuilderDev commented Mar 11, 2024

gante left a comment

gante Mar 11, 2024

amyeroberts left a comment

amyeroberts Mar 11, 2024

zucchini-nlp Mar 11, 2024

amyeroberts Mar 11, 2024

zucchini-nlp Mar 11, 2024

gante Mar 12, 2024

amyeroberts Mar 13, 2024

amyeroberts Mar 11, 2024

zucchini-nlp Mar 11, 2024

amyeroberts Mar 11, 2024

zucchini-nlp Mar 11, 2024

amyeroberts Mar 11, 2024

zucchini-nlp Mar 11, 2024 •

edited

Loading

gante Mar 12, 2024

zucchini-nlp Mar 13, 2024 •

edited

Loading

gante left a comment

amyeroberts commented Mar 13, 2024

zucchini-nlp commented Mar 13, 2024

zucchini-nlp commented Mar 20, 2024

amyeroberts left a comment

zucchini-nlp commented Apr 8, 2024

amyeroberts commented Apr 8, 2024

Fix length related warnings in speculative decoding #29585

Fix length related warnings in speculative decoding #29585

Conversation

zucchini-nlp commented Mar 11, 2024 • edited by ArthurZucker Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Mar 11, 2024

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

amyeroberts commented Mar 13, 2024

zucchini-nlp commented Mar 13, 2024

zucchini-nlp commented Mar 20, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Apr 8, 2024

amyeroberts commented Apr 8, 2024

zucchini-nlp commented Mar 11, 2024 •

edited by ArthurZucker

Loading

zucchini-nlp Mar 11, 2024 •

edited

Loading

zucchini-nlp Mar 13, 2024 •

edited

Loading