Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flash decode v2 #340

Merged
merged 3 commits into from
Dec 20, 2024
Merged

Add flash decode v2 #340

merged 3 commits into from
Dec 20, 2024

Conversation

harsh-nod
Copy link
Contributor

@harsh-nod harsh-nod commented Dec 16, 2024

This PR at support for a new variant of flash decoding. In this version we use a split – K approach, where the first kernel compute the attention parallelizing over the sequence length, and the second kernel sums up all the results. In order to make this work, we had to add support for the following

  • SymbolicAliases - these are language constructs that allow us to say that one symbol is associated with another symbol with a specific relationship. This is implemented as constructing additional work group constraints telling constraints and wave constraints. They are ignored during expansion.
  • Added a template for flash decoding that is used in both the lit test as well as the end to end tests

Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
@harsh-nod harsh-nod merged commit 20507b7 into iree-org:main Dec 20, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants