Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix length related warnings in speculative decoding #29585
Fix length related warnings in speculative decoding #29585
Changes from 2 commits
7859c43
4b41249
7c35d05
0cb8c1d
f2b820e
dc8235c
2f48285
cc8a47c
0683f2d
24ed5d8
31ad6c0
2a2ec3d
081e1b9
08ff6a9
bb14b36
3b628e0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed the
self.generation_config
has this attribute?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the GenerationConfig usually initiates those to 0 or None, if not indicated by the user. So, we check if it is not None for
min_new_tokens
and then setmin_length
to maximum (code says min, i'll fix it) between user-defined value or the default 0.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts AFAIK only Whisper (and perhaps other audio models?) uses attributes that may not exist in a
generation_config
, it is a fairly regular object with everything initialized in__init__
:DThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, and can we ever expect to have whisper generate configs being used here, or does the model always just use it's own custom generation code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this checking for None-ness? Or can it be
False
? Otherwise, defaulting to 0 if it's 0 is a superfluous checkThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Noneness, I'll specify it as
not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this result in negative values?
If I've generated more tokens than
self.min_length
i.e.new_cur_len > self.min_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I've missed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a class attribute and not the generation config, as for
max_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is because for maximum length we deprecated
generation_config.max_new_tokens
, so we can use the only possible attribute for max length. Yet, for minimum length we have to attributes, both of which are equally valid. That's why ininit
we manually set themin_length
by checking both attributes, if they are set by user.EDIT: I just remembered why I did that wat. We have to set generation_config's
min_length
to 0 ininit
, that's required to avoid unnecessary warnings. That's why I saved it as class attribute. Otherwise, the generation woul receive kwargs like below and throw warnings{"min_length"=20, "min_new_tokens"=5, "max_new_tokens=5}
@gante , btw, don't you think we can also deprecate
min_new_token
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp it's the other way around, if anything we would want to deprecate the
min_tokens
argument/config option :)max_new_tokens
andmin_new_tokens
are much more predictable from a user point of view, as the user doesn't need to be concerned with the input length. In the past, beforemax_new_tokens
andmin_new_tokens
were introduced, we would often get issues from confused users.Inside generate, however, it is much easier to use the total length for control (no need to track the input length). We set
max_length
frommax_new_tokens
when needed (here), perhaps we should do the same withmin_length
to simplify the procedure in this PR :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to make a separate method for all length related corrections. Added one more test for min length, same as we have for max length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, this was surely throwing a deprecation warning 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should also be a test that
min_new_tokens
parameter behaves as expected, especially whenmax_new_tokens
is also set and when it's not set at all i.e. goes to default valueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one more test checking if the length is in range between min and max lengths