Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give tests parameterized by shapes better names #406

Merged
merged 8 commits into from
Jan 28, 2025

Conversation

GMNGeoffrey
Copy link
Contributor

@GMNGeoffrey GMNGeoffrey commented Jan 22, 2025

I found it helpful to have more descriptive test names when working on kernels, especially as you can select by them. This just changes the tests already using the shared get_test_shapes function. I think we could expand it to others, but there are some decisions around test organization that I don't want to get into just now.

This incidentally also appears to fix a bug that was creating duplicate perf tests. I printed out get_test_shapes("chained_gemm") and got the following:

(8, 128, 128, 64, 256)
(40, 1024, 64, 64, 1024)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)

Looks like each time shapes.py was imported, the loop adding perf tests got run again.

Before:

testChainedGemm[MMAType.F32_32x32x8_F16-False-shape15]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape14]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape13]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape12]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape11]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape10]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape9]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape8]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape7]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape6]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape5]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape4]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape3]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape2]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape1]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape0]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape15]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape14]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape13]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape12]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape11]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape10]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape9]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape8]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape7]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape6]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape5]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape4]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape3]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape2]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape1]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape0]

After:

testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256]

@GMNGeoffrey GMNGeoffrey requested a review from harsh-nod January 22, 2025 20:29
Copy link
Contributor

@harsh-nod harsh-nod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks nice, thank you!

Note that this adds perf tests for batched gemm.
@GMNGeoffrey
Copy link
Contributor Author

GMNGeoffrey commented Jan 27, 2025

Do we actually want all the shapes to be in a shared file? It allows sharing when you have multiple test files using the same shapes, but the only case of that is with attention, so I'm not sure whether it's worth the extra layer of indirection. WDYT? Also, do we always want perf tests to have the same shapes as regular tests? I've actually been testing with much smaller shapes when I've been developing, which has the advantage of being faster and usually easier to debug.

@GMNGeoffrey
Copy link
Contributor Author

To get this landed with the fix to not have oodles of extra test cases, I'm going to narrow the scope and just improve the names for the tests already using get_test_shapes

@GMNGeoffrey GMNGeoffrey requested a review from harsh-nod January 27, 2025 22:59
@GMNGeoffrey
Copy link
Contributor Author

Old PR description:

This is an example of what I think the change should be, but I didn't want to go through and do it for everything until checking in with others. I found it helpful to have more descriptive test names when working on kernels, especially as you can select by them.

This incidentally also appears to fix a bug that was creating duplicate perf tests. I printed out get_test_shapes("chained_gemm") and got the following:

(8, 128, 128, 64, 256)
(40, 1024, 64, 64, 1024)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)

Looks like each time shapes.py was imported, the loop adding perf tests got run again.

Before:

testChainedGemm[MMAType.F32_32x32x8_F16-False-shape15]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape14]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape13]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape12]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape11]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape10]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape9]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape8]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape7]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape6]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape5]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape4]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape3]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape2]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape1]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape0]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape15]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape14]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape13]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape12]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape11]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape10]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape9]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape8]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape7]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape6]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape5]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape4]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape3]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape2]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape1]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape0]

After:

testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256]

Before:

testGemm[MMAType.F32_32x32x8_F16-True-True-shape5]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape4]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape3]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape2]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape1]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape0]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape5]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape4]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape3]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape2]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape1]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape0]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape5]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape4]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape3]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape2]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape1]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape0]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape5]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape4]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape3]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape2]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape1]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape0]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape5]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape4]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape3]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape2]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape1]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape0]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape5]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape4]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape3]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape2]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape1]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape0]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape5]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape4]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape3]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape2]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape1]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape0]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape5]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape4]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape3]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape2]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape1]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape0]

After:

testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-4096x20480x2560-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-4096x20480x2560-perf]

IMO, it's also a bit weird that the perf test param piggy-backs on shapes and then is separately controlled by a command line arg. Could it just be a separate parameter instead with the appropriate mark and the test can read from there rather than directly accessing the command line arg? One disadvantage to that is that in the current proposal there'd then be a "no_perf" suffix on all the non-perf tests. I couldn't figure out a way to my pytest have no id for a test param value (you can do the empty string but it still gets joined on - and your test name looks weird).

WDYT?

@harsh-nod
Copy link
Contributor

Do we actually want all the shapes to be in a shared file? It allows sharing when you have multiple test files using the same shapes, but the only case of that is with attention, so I'm not sure whether it's worth the extra layer of indirection. WDYT? Also, do we always want perf tests to have the same shapes as regular tests? I've actually been testing with much smaller shapes when I've been developing, which has the advantage of being faster and usually easier to debug.

Yes sharing shapes makes sense. I think it's okay to have that layer of indirection. Good point - in general we want the test shapes to be smaller but representative and have perf cover the final shapes of interest. The interesting thing here is that we also have iree-kernel-benchmark for more comprehensive performance testing, so think of the performance tests here as a smaller subset that we can use to identify perf regressions. This was the original idea but I don't think it was implemented well (or at all ).

@harsh-nod
Copy link
Contributor

Old PR description:

This is an example of what I think the change should be, but I didn't want to go through and do it for everything until checking in with others. I found it helpful to have more descriptive test names when working on kernels, especially as you can select by them.

This incidentally also appears to fix a bug that was creating duplicate perf tests. I printed out get_test_shapes("chained_gemm") and got the following:

(8, 128, 128, 64, 256)
(40, 1024, 64, 64, 1024)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)

Looks like each time shapes.py was imported, the loop adding perf tests got run again.

Before:

testChainedGemm[MMAType.F32_32x32x8_F16-False-shape15]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape14]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape13]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape12]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape11]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape10]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape9]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape8]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape7]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape6]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape5]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape4]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape3]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape2]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape1]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape0]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape15]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape14]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape13]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape12]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape11]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape10]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape9]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape8]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape7]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape6]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape5]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape4]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape3]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape2]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape1]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape0]

After:

testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256]

Before:

testGemm[MMAType.F32_32x32x8_F16-True-True-shape5]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape4]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape3]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape2]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape1]
testGemm[MMAType.F32_32x32x8_F16-True-True-shape0]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape5]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape4]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape3]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape2]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape1]
testGemm[MMAType.F32_32x32x8_F16-True-False-shape0]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape5]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape4]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape3]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape2]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape1]
testGemm[MMAType.F32_32x32x8_F16-False-True-shape0]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape5]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape4]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape3]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape2]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape1]
testGemm[MMAType.F32_32x32x8_F16-False-False-shape0]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape5]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape4]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape3]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape2]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape1]
testGemm[MMAType.F32_16x16x16_F16-True-True-shape0]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape5]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape4]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape3]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape2]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape1]
testGemm[MMAType.F32_16x16x16_F16-True-False-shape0]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape5]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape4]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape3]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape2]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape1]
testGemm[MMAType.F32_16x16x16_F16-False-True-shape0]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape5]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape4]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape3]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape2]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape1]
testGemm[MMAType.F32_16x16x16_F16-False-False-shape0]

After:

testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-no_dyn-sched-4096x20480x2560-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-1024x5120x640]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_16x16x16_F16-dyn-sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-no_dyn-sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-no_sched-4096x20480x2560-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-1024x5120x640]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-2048x10240x1280]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-4096x20480x2560]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-1024x5120x640-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-2048x10240x1280-perf]
testGemm[MMAType.F32_32x32x8_F16-dyn-sched-4096x20480x2560-perf]

IMO, it's also a bit weird that the perf test param piggy-backs on shapes and then is separately controlled by a command line arg. Could it just be a separate parameter instead with the appropriate mark and the test can read from there rather than directly accessing the command line arg? One disadvantage to that is that in the current proposal there'd then be a "no_perf" suffix on all the non-perf tests. I couldn't figure out a way to my pytest have no id for a test param value (you can do the empty string but it still gets joined on - and your test name looks weird).

WDYT?

That makes sense. Please remove the command line arg and keep it as a separate parameter. The no_perf suffix is alright.

@GMNGeoffrey
Copy link
Contributor Author

Ok I can continue fiddling with it. Can we land this one as is so we can get rid of all the duplicate tests though (sorry if I'd realized that problem to begin with I would have just created a PR for that).

@GMNGeoffrey
Copy link
Contributor Author

Yes sharing shapes makes sense. I think it's okay to have that layer of indirection. Good point - in general we want the test shapes to be smaller but representative and have perf cover the final shapes of interest. The interesting thing here is that we also have iree-kernel-benchmark for more comprehensive performance testing, so think of the performance tests here as a smaller subset that we can use to identify perf regressions. This was the original idea but I don't think it was implemented well (or at all ).
That makes sense. Please remove the command line arg and keep it as a separate parameter. The no_perf suffix is alright.

Ok, given that we want different shapes for perf and non-perf, I think maybe what would make the most sense would be to have a parametrization that controls two parameters (shape and perf), which is a common thing in pytest, and we can have an API like @param_shapes(shapes=[(3, 16, 16, 16, 16)], perf_shapes=[(40, 512, 1024, 1024, 512)]). If we still want perf tests controlled by a command line arg, it could also be one that adjusts which params are even generated rather than generating perf tests and skipping them: https://docs.pytest.org/en/stable/example/parametrize.html#generating-parameters-combinations-depending-on-command-line

@GMNGeoffrey
Copy link
Contributor Author

Can we land this one as is so we can get rid of all the duplicate tests though?

(I don't have merge permissions)

@harsh-nod harsh-nod merged commit 09bd1a8 into iree-org:main Jan 28, 2025
10 checks passed
xintin pushed a commit to xintin/iree-turbine that referenced this pull request Feb 14, 2025
I found it helpful to have more descriptive test names when working on
kernels, especially as you can select by them. This just changes the
tests already using the shared `get_test_shapes` function. I think we
could expand it to others, but there are some decisions around test
organization that I don't want to get into just now.

This incidentally also appears to fix a bug that was creating duplicate
perf tests. I printed out `get_test_shapes("chained_gemm")` and got the
following:

```
(8, 128, 128, 64, 256)
(40, 1024, 64, 64, 1024)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None)
```

Looks like each time shapes.py was imported, the loop adding perf tests
got run again.

Before:

```
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape15]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape14]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape13]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape12]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape11]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape10]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape9]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape8]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape7]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape6]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape5]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape4]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape3]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape2]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape1]
testChainedGemm[MMAType.F32_32x32x8_F16-False-shape0]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape15]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape14]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape13]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape12]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape11]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape10]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape9]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape8]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape7]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape6]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape5]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape4]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape3]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape2]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape1]
testChainedGemm[MMAType.F32_16x16x16_F16-False-shape0]
```

After:

```
testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256-perf]
testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024]
testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256]
```

Signed-off-by: xintin <vermagaurav9408@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants