-
Notifications
You must be signed in to change notification settings - Fork 23.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UpSample-nearest cuda kernel update #21694
Conversation
updating upsampling kernel: 1. avoids atomicAdd for better fp16 performance. 2. better launch configures for 2D input.
Perf number/scripts will be posted shortly. cc @ngimel |
fp16 forward perf number has been observed to be all over the place, especially for tiny input. :/ Here's the script for the benchmark
|
Removed the specialized 2d kernel, as the speedup is sparse. Caching seems to have done a great job saving the memory accessing pattern. I don't think I can justify having a dedicated kernel there 😢 |
@ngimel happy to merge this if you give it the OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure there are checks against empty tensors, and make sure you are not excessively zeroing outputs. Those already might be somewhere in the code, and I might be blind, in which case it is good to go.
Addressed review comments. Should be good to go when test passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review comments are addressed, great job, Jie!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: updating upsampling kernel: 1. avoids atomicAdd for better fp16 performance. 2. better launch configures for 2D input. Pull Request resolved: pytorch/pytorch#21694 Differential Revision: D15875791 Pulled By: ezyang fbshipit-source-id: 426fc5d5f0c0cdf58bfa1a2b564f17a9ea286fa4
updating upsampling kernel: