Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Use affine.delinearize_index for MMA tiles and vector distribution #19228

Merged
merged 8 commits into from
Jan 28, 2025

Conversation

krzysz00
Copy link
Contributor

@krzysz00 krzysz00 commented Nov 20, 2024

This commit updates some by-hand delinearizations in MMA tile generation and vector distribution to use affine.delinearize_index instead.

The main tricky thing here is that a lot of that MMA code would use (id / stride) % size, whereas delinearize's outputs all have the form (id % stride) / nextStride. In all the cases at issue, we could use a utility to convert arrays of sizes and strides to a permutation on a delinearization basis.

In order to not break existing tests, the trivial-loop detector had to be manually instrumented to support delinearize_index (and I got util.assume.int while I was there). (I suspect there're a few other cases, and that, long-term, that detector should be using one of the bounds interfaces, but that's not this PR)

@krzysz00 krzysz00 force-pushed the users/krzysz00/gpu-distribute-with-linearize branch from 5328767 to 5a8fa83 Compare November 21, 2024 20:54
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch from 829c3d5 to aaf6cf8 Compare November 21, 2024 22:30
@krzysz00 krzysz00 force-pushed the users/krzysz00/gpu-distribute-with-linearize branch from 5a8fa83 to 291f570 Compare November 26, 2024 19:24
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch from aaf6cf8 to 3ccf6a4 Compare November 26, 2024 19:28
Base automatically changed from users/krzysz00/gpu-distribute-with-linearize to main November 26, 2024 20:38
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch 4 times, most recently from ba7bc66 to d577300 Compare December 17, 2024 23:07
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch from 31e58d0 to 5592a62 Compare January 3, 2025 22:12
@krzysz00
Copy link
Contributor Author

krzysz00 commented Jan 3, 2025

Update: staring at tests showed that I should go implement the value bounds op interfaces on the affine.delinearize_index and affine.linearize_index because there were some single-iteration loops that weren't getting eliminated.

@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch from 8e5ebb0 to de7e313 Compare January 7, 2025 00:13
@krzysz00 krzysz00 marked this pull request as ready for review January 7, 2025 00:21
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch 2 times, most recently from 0277d07 to 3ff65f0 Compare January 13, 2025 21:14
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch 4 times, most recently from 12377f8 to d8e695c Compare January 27, 2025 17:38
Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome cleanup! Overall LGTM, one comment about how subgroup tiles are handled here because in the past they didn't actually share any link with subgroup size/number of subgroups.

compiler/src/iree/compiler/Utils/Indexing.cpp Outdated Show resolved Hide resolved
compiler/src/iree/compiler/Utils/Indexing.cpp Show resolved Hide resolved
///
/// As a special case, dimensions with stride 0 are treated as size-1
/// dimensions that are placed at the end of the delinearization, from where
/// they will canonicalize to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ for the great docs

This commit updates some by-hand delinearizations in MMA tile generation and vector distribution to use `affine.delinearize_index` instead.

The main tricky thing here is that a lot of that MMA code would use `(id / stride) % size`, whereas delinearize's outputs all have the form `(id % stride) / nextStride`. In all the cases at issue, we could use a utility to convert arrays of sizes and strides to a permutation on a delinearization basis.

In order to not break existing tests, the trivial-loop detector had to be manually instrumented to support `delinearize_index` (and I got `util.assume.int` while I was there). (I suspect there're a few other cases, and that, long-term, that detector should be using one of the bounds interfaces, but that's not this PR)# This is a combination of 7 commits.
@krzysz00 krzysz00 force-pushed the users/krzysz00/linearize-mma branch from d8e695c to 88ade5d Compare January 27, 2025 22:29
krzysz00 and others added 5 commits January 27, 2025 16:34
@krzysz00 krzysz00 merged commit ecd67d9 into main Jan 28, 2025
44 checks passed
@krzysz00 krzysz00 deleted the users/krzysz00/linearize-mma branch January 28, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants