Add flash decode v2 #340

harsh-nod · 2024-12-16T17:41:48Z

This PR at support for a new variant of flash decoding. In this version we use a split – K approach, where the first kernel compute the attention parallelizing over the sequence length, and the second kernel sums up all the results. In order to make this work, we had to add support for the following

SymbolicAliases - these are language constructs that allow us to say that one symbol is associated with another symbol with a specific relationship. This is implemented as constructing additional work group constraints telling constraints and wave constraints. They are ignored during expansion.
Added a template for flash decoding that is used in both the lit test as well as the end to end tests

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

harsh-nod force-pushed the decodev2 branch 12 times, most recently from 9caab52 to f8f5966 Compare December 18, 2024 07:36

harsh-nod requested review from Hardcode84, raikonenfnu and martin-luecke December 18, 2024 08:17

harsh-nod force-pushed the decodev2 branch from de1706f to 61bd0cc Compare December 18, 2024 23:01

harsh-nod added 2 commits December 18, 2024 15:01

Add flash decode v2

bde09df

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

Refactor and cleanup

0edf8ac

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

harsh-nod force-pushed the decodev2 branch from 61bd0cc to 0edf8ac Compare December 18, 2024 23:02

More cleanups

ed7d747

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

harsh-nod force-pushed the decodev2 branch from 5daf71b to ed7d747 Compare December 19, 2024 15:46

Hardcode84 approved these changes Dec 20, 2024

View reviewed changes

harsh-nod merged commit 20507b7 into iree-org:main Dec 20, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flash decode v2 #340

Add flash decode v2 #340

harsh-nod commented Dec 16, 2024 •

edited

Loading

Add flash decode v2 #340

Add flash decode v2 #340

Conversation

harsh-nod commented Dec 16, 2024 • edited Loading

harsh-nod commented Dec 16, 2024 •

edited

Loading