[TKW] Enable each MMA to have it's own intrinsic #287

raikonenfnu · 2024-11-21T18:09:58Z

In order to align layouts in chained gemm or attention inFP8, we'd need to use different intrinsics for the 1st and 2nd mma. In order to achieve this, we'd need to do:

Set optional MMA_Type as an arg in tkw.mma
Modify index_seq_analysis and constraints to use the MMAOp's intrinsic type as opposed to the hw_constraint type when available.

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

harsh-nod · 2024-11-22T02:18:02Z

tests/kernel/wave/wave_attention_test.py

+            filename = f"wave_attention_{'x'.join(map(str, shape))}.mlir"
+            with open(filename, "w") as f:
+                f.write(mb.module_op.get_asm())
+        rmse = torch.sqrt(torch.mean(torch.square(output - torch_ref)))


Can you use assert_close with appropriate atol and rtol here instead of manually computing this?

yeah, but the FP8 range/accuracy is quite limited so, not sure if atol/rtol becomes meaningful here. For example we have a case where the rtol is just ridiculously large:

Mismatched elements: 2613487 / 2621440 (99.7%) Greatest absolute difference: 0.04659026861190796 at index (26, 785, 31) (up to 1e-05 allowed) Greatest relative difference: 1510450.625 at index (2, 593, 37) (up to 1.3e-06 allowed) (Pdb) torch_ref[2, 593, 37] tensor(9.8498e-09, device='cuda:0') (Pdb) output[2, 593, 37] tensor(0.0149, device='cuda:0')

We may be able to improve it once we improve the quantization-dequantization to get better range, but for now with atol-rtol, I think it will not give very good idea on accuracy due to the inconsistency of our FP8 range.

Comparing the same values as above, these tensors qualitatively looks quite good, especially without any rescaling/adjusting of the range.

tensor([[ 0.0500, -0.0215, 0.0078, ..., -0.0116, -0.0850, 0.0386], [ 0.0232, 0.0168, 0.0137, ..., 0.1215, 0.0033, 0.0275], [-0.0044, -0.0127, 0.0993, ..., 0.0158, 0.0504, 0.0125], ..., [ 0.0696, -0.0049, -0.0455, ..., 0.0299, -0.0396, 0.0955], [-0.0473, 0.1092, -0.0237, ..., 0.0194, -0.0045, 0.0034], [-0.0104, 0.0096, 0.0394, ..., 0.0266, -0.0525, 0.0692]], device='cuda:0') (Pdb) torch_ref[0,:] tensor([[ 0.0467, -0.0231, 0.0064, ..., -0.0106, -0.0863, 0.0408], [ 0.0245, 0.0132, 0.0127, ..., 0.1255, 0.0051, 0.0280], [-0.0083, -0.0181, 0.1016, ..., 0.0180, 0.0563, 0.0145], ..., [ 0.0675, -0.0112, -0.0434, ..., 0.0295, -0.0368, 0.0981], [-0.0471, 0.1066, -0.0230, ..., 0.0206, -0.0002, -0.0072], [-0.0089, 0.0065, 0.0376, ..., 0.0283, -0.0549, 0.0682]], ```

Sounds good, let's merge for now and iterate

tests/kernel/wave/wave_attention_test.py

harsh-nod

Overall, lgtm! Just a question about using assert_close

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu added 2 commits November 21, 2024 10:11

[TKW] Enable each MMA to have it's own intrinsic

a03e7ea

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

Add attention test + fix chain gemm test to support large size

28babc8

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu requested a review from harsh-nod November 21, 2024 18:25

fix lit since we introduce optional variable

98333f9

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

harsh-nod reviewed Nov 22, 2024

View reviewed changes

tests/kernel/wave/wave_attention_test.py Outdated Show resolved Hide resolved

harsh-nod approved these changes Nov 22, 2024

View reviewed changes

fix nit

985ee29

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

raikonenfnu merged commit 965247e into iree-org:main Nov 22, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TKW] Enable each MMA to have it's own intrinsic #287

[TKW] Enable each MMA to have it's own intrinsic #287

raikonenfnu commented Nov 21, 2024 •

edited

Loading

harsh-nod Nov 22, 2024

raikonenfnu Nov 22, 2024 •

edited

Loading

raikonenfnu Nov 22, 2024 •

edited

Loading

harsh-nod Nov 22, 2024

harsh-nod left a comment

[TKW] Enable each MMA to have it's own intrinsic #287

[TKW] Enable each MMA to have it's own intrinsic #287

Conversation

raikonenfnu commented Nov 21, 2024 • edited Loading

harsh-nod Nov 22, 2024

Choose a reason for hiding this comment

raikonenfnu Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

raikonenfnu Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

harsh-nod Nov 22, 2024

Choose a reason for hiding this comment

harsh-nod left a comment

Choose a reason for hiding this comment

raikonenfnu commented Nov 21, 2024 •

edited

Loading

raikonenfnu Nov 22, 2024 •

edited

Loading

raikonenfnu Nov 22, 2024 •

edited

Loading