-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add heuristic algorithm for speculative #3006
Merged
ggerganov
merged 6 commits into
ggerganov:master
from
leng-yue:add-speculative-heuristic
Sep 14, 2023
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
98230ef
Add heuristic algo for speculative
leng-yue 9248528
Constrain minimum n_draft to 2
leng-yue dddd784
speculative : improve heuristic impl
ggerganov 4e6e951
Merge branch 'master' into HEAD
ggerganov d9559b7
speculative : be more rewarding upon guessing max drafted tokens
ggerganov b5efa62
speculative : fix typos
ggerganov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leng-yue
I've refactored the implementation to be more contained.
Also, we were rewarding the draft even when it hasn't sampled all
n_draft
tokens, which does not seem correct.For example, let's say
n_draft
is 16 currently, but the draft samples just 3 tokens because the "low-probability" check has been triggered. Even if all 3 tokens were accepted, we should not reward the draft model, because this is just a small part of what we asked it to do.Regarding the reproducibility - we will study this more in #3014
My guess is that this behavior would occur even without the heuristic - probably it's just less likely to happen for some reason when the heuristic is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my earlier implementation, rewards were given only when all tokens were accepted. So if only 3 out of 16 tokens are accepted, the n_tokens value would be decreased by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly. This check does not guarantee you have
n_draft
tokens accepted:llama.cpp/examples/speculative/speculative.cpp
Lines 146 to 148 in 98230ef
The reason is because
drafted.size() <= n_draft
due to another heuristic of not drafting more tokens if the drafter becomes "unsure":llama.cpp/examples/speculative/speculative.cpp
Lines 192 to 196 in 98230ef
So in the majority of cases, when
all_accepted == true
, you would have accepted less thann_draft
tokens. That's why yourn_draft
would increase so much even beyond 70 in some cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.