Sparse tree #42

ctlllll · 2023-09-20T16:07:08Z

update roadmap

This reverts commit 922689a.

…parse_tree

austinsilveria · 2023-09-30T14:24:28Z

medusa/model/utils.py

+    # Extract the TOPK candidates from the medusa logits.
+    candidates_medusa_logits = torch.topk(medusa_logits[:, 0, -1], TOPK, dim = -1).indices

-    Returns:
-    - candidates (torch.Tensor): Cartesian product of candidate tokens across Medusa layers.
-    - tree_candidates (torch.Tensor): Reshaped candidates matched to the tree structure.
-    """
-    # Greedy decoding for original logits
-    candidates = [torch.argmax(logits[:, -1]).unsqueeze(0)]
-    for i in range(medusa_logits.shape[0]):
-        candidate_i = torch.topk(medusa_logits[i, 0, -1], medusa_topk[i]).indices
-        candidates.append(candidate_i)
-    candidates_flat = torch.cat(candidates)
-    candidates = torch.cartesian_prod(*candidates)
-    tree_candidates = candidates_flat[tree_indices].unsqueeze(0)
-    return candidates, tree_candidates
+    # Combine the selected candidate from the original logits with the topk medusa logits.
+    candidates = torch.cat([candidates_logit, candidates_medusa_logits.view(-1)], dim=-1)
+
+    # Map the combined candidates to the tree indices to get tree candidates.
+    tree_candidates = candidates[tree_indices]


What if we pruned low probability subtrees before verifying with the base model? This would give the benefits of sparse tree attention without relying on manually specifying the sparse tree paths in medusa_choices.

Concretely,

a path's probability is the product of its nodes' proabilities according to medusa_logits (so it's not a formal verification of the sequence probability like running the base model, but we may be able to prune quite a few very low probability subtrees).

tree indices are updated to reflect the paths remaining after pruning

generate_medusa_buffers creates buffers for a smaller size than the exponentially growing tree, so we prune to fit that size

What's the intuition of why this should improve decoding speed?
Medusa may excel in cases where the probability density of the tree is heavily imblanced--i.e., we have an easy subsequence coming up which is reflected in the Medusa heads being confident in their predictions. But in other cases, the deeper Medusa heads are uncertain and we're pretty much wasting computation verifying deep parts of the subtree with full attention.

It would be interesting to see how much the shape of the tree's probability density changes across different contexts. If it has a large variance, it seems like it would be valuable to have a more dynamic sparse tree that takes an optimal shape based on the current context.

Basically, this could allow us to

crank up the number of Medusa heads/size of the tree

only verify deep paths of the tree with full tree attention if there's a decent Medusa probability

not pay for a massive tree in cases where deep Medusa heads are uncertain

leeyeehoo and others added 12 commits September 16, 2023 03:27

Merge pull request #22 from FasterDecoding/main

269d2a5

update roadmap

Add sparse tree

11af0aa

Add sparse tree

922689a

Revert "Add sparse tree"

4054fa2

This reverts commit 922689a.

Merge branch 'sparse_tree' of github.com:FasterDecoding/Medusa into s…

e9d2191

…parse_tree

update notebook

55c77cb

update notebook

af86f3b

update notebook

e315780

add judge results

4681013

add judge results

8888d3a

upload train accuracy

5d374e9

Merge branch 'main' into sparse_tree

89f8ec0

ctlllll merged commit 10a2697 into main Sep 20, 2023

austinsilveria reviewed Sep 30, 2023

View reviewed changes

leeyeehoo deleted the sparse_tree branch December 22, 2023 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse tree #42

Sparse tree #42

ctlllll commented Sep 20, 2023

austinsilveria Sep 30, 2023

Sparse tree #42

Sparse tree #42

Conversation

ctlllll commented Sep 20, 2023

austinsilveria Sep 30, 2023

Choose a reason for hiding this comment