-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] InplaceNorm, affine-less LayerNorm #53
Conversation
Considering that the fused matmul kernel already supports a joint compilation with the activation function, I'm not sure how useful it'd be to add the activation part of GLU to normalization. |
Here are the benchmarks for LayerNorm compared to affine-less LayerNorm
|
Codecov Report
@@ Coverage Diff @@
## main #53 +/- ##
=======================================
Coverage ? 54.48%
=======================================
Files ? 70
Lines ? 4076
Branches ? 0
=======================================
Hits ? 2221
Misses ? 1855
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
I just noticed that this PR contains parts of #45. Is that going to be an issue? Should I pick it apart? |
it's easier for the review and landing if we keep things orthogonal, if that's ok with you. ? |
Else the results are pretty great @ClashLuke, and composing the transforms while the data is in shared memory is the right thing to do I think |
"Something is wrong in the backward graph, possibly because of an inplace operation after the layernorm" | ||
|
||
# enqueue kernel using forward pass heuristics | ||
# also compute partial sums for DW and DB | ||
|
||
# fmt: off | ||
meta = {"BLOCK_SIZE_N": ctx.BLOCK_SIZE_N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably saw that, but this is the same as passing it directly down when calling the kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could integrate it into the two calls below. It just felt cleaner to not repeat shared arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided you rebase out of the other PR, this looks great to me, thanks a lot @ClashLuke. I'd like to homogenize the coding style on all the kernels so that this part is easier to grasp, but I'll do that after this lands
@ClashLuke ok to ?
|
It doesn't look like rebasing solved the problem as I merged inside of the graph, so I just squashed the changes into a single commit. |
@ClashLuke alright, looks good to me, you would just need to rebase on main now :) (sorry about all that, the doc build problem was fixed yesterday) |
Looks good, I'll land as soon as the final test is done, thank you so much @ClashLuke ! Adding you to the contributors in a future README update |
This should resolve #50