-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give tests parameterized by shapes better names #406
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks nice, thank you!
Note that this adds perf tests for batched gemm.
Do we actually want all the shapes to be in a shared file? It allows sharing when you have multiple test files using the same shapes, but the only case of that is with attention, so I'm not sure whether it's worth the extra layer of indirection. WDYT? Also, do we always want perf tests to have the same shapes as regular tests? I've actually been testing with much smaller shapes when I've been developing, which has the advantage of being faster and usually easier to debug. |
To get this landed with the fix to not have oodles of extra test cases, I'm going to narrow the scope and just improve the names for the tests already using |
Old PR description: This is an example of what I think the change should be, but I didn't want to go through and do it for everything until checking in with others. I found it helpful to have more descriptive test names when working on kernels, especially as you can select by them. This incidentally also appears to fix a bug that was creating duplicate perf tests. I printed out
Looks like each time shapes.py was imported, the loop adding perf tests got run again. Before:
After:
Before:
After:
IMO, it's also a bit weird that the perf test param piggy-backs on shapes and then is separately controlled by a command line arg. Could it just be a separate parameter instead with the appropriate mark and the test can read from there rather than directly accessing the command line arg? One disadvantage to that is that in the current proposal there'd then be a "no_perf" suffix on all the non-perf tests. I couldn't figure out a way to my pytest have no id for a test param value (you can do the empty string but it still gets joined on WDYT? |
Yes sharing shapes makes sense. I think it's okay to have that layer of indirection. Good point - in general we want the test shapes to be smaller but representative and have perf cover the final shapes of interest. The interesting thing here is that we also have iree-kernel-benchmark for more comprehensive performance testing, so think of the performance tests here as a smaller subset that we can use to identify perf regressions. This was the original idea but I don't think it was implemented well (or at all ). |
That makes sense. Please remove the command line arg and keep it as a separate parameter. The no_perf suffix is alright. |
Ok I can continue fiddling with it. Can we land this one as is so we can get rid of all the duplicate tests though (sorry if I'd realized that problem to begin with I would have just created a PR for that). |
Ok, given that we want different shapes for perf and non-perf, I think maybe what would make the most sense would be to have a parametrization that controls two parameters (shape and perf), which is a common thing in pytest, and we can have an API like |
(I don't have merge permissions) |
I found it helpful to have more descriptive test names when working on kernels, especially as you can select by them. This just changes the tests already using the shared `get_test_shapes` function. I think we could expand it to others, but there are some decisions around test organization that I don't want to get into just now. This incidentally also appears to fix a bug that was creating duplicate perf tests. I printed out `get_test_shapes("chained_gemm")` and got the following: ``` (8, 128, 128, 64, 256) (40, 1024, 64, 64, 1024) ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((8, 128, 128, 64, 256),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ParameterSet(values=(ParameterSet(values=(ParameterSet(values=((40, 1024, 64, 64, 1024),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None),), marks=(MarkDecorator(mark=Mark(name='perf_only', args=(), kwargs={})),), id=None) ``` Looks like each time shapes.py was imported, the loop adding perf tests got run again. Before: ``` testChainedGemm[MMAType.F32_32x32x8_F16-False-shape15] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape14] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape13] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape12] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape11] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape10] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape9] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape8] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape7] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape6] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape5] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape4] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape3] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape2] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape1] testChainedGemm[MMAType.F32_32x32x8_F16-False-shape0] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape15] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape14] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape13] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape12] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape11] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape10] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape9] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape8] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape7] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape6] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape5] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape4] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape3] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape2] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape1] testChainedGemm[MMAType.F32_16x16x16_F16-False-shape0] ``` After: ``` testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024-perf] testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256-perf] testChainedGemm[MMAType.F32_32x32x8_F16-False-40x1024x64x64x1024] testChainedGemm[MMAType.F32_32x32x8_F16-False-8x128x128x64x256] testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024-perf] testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256-perf] testChainedGemm[MMAType.F32_16x16x16_F16-False-40x1024x64x64x1024] testChainedGemm[MMAType.F32_16x16x16_F16-False-8x128x128x64x256] ``` Signed-off-by: xintin <vermagaurav9408@gmail.com>
I found it helpful to have more descriptive test names when working on kernels, especially as you can select by them. This just changes the tests already using the shared
get_test_shapes
function. I think we could expand it to others, but there are some decisions around test organization that I don't want to get into just now.This incidentally also appears to fix a bug that was creating duplicate perf tests. I printed out
get_test_shapes("chained_gemm")
and got the following:Looks like each time shapes.py was imported, the loop adding perf tests got run again.
Before:
After: