-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saturation in compiler generated Stickify #2877
Saturation in compiler generated Stickify #2877
Conversation
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Updated the code to gen the min/max directly. update performance:
For a performance reduction of 3% (compared to compiler generated without saturation), and still 55% faster than zDNN. Verified that it generates the right code
|
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for a quick update!
Jenkins Linux s390x Build #15143 [push] Saturation in compiler g... started at 22:33 |
Jenkins Linux amd64 Build #15138 [push] Saturation in compiler g... started at 21:33 |
Jenkins Linux ppc64le Build #14168 [push] Saturation in compiler g... started at 22:44 |
Jenkins Linux amd64 Build #15138 [push] Saturation in compiler g... passed after 1 hr 13 min |
Jenkins Linux s390x Build #15143 [push] Saturation in compiler g... passed after 1 hr 39 min |
Jenkins Linux ppc64le Build #14168 [push] Saturation in compiler g... passed after 2 hr 9 min |
This algorithm adds the equivalent of this (per William Jones) to the stickify compiler generated code.
When running on z16 with Roberta base 11 and 6x384 in sequential mode
As shown, saturation runs still faster than zdnn based code.
Checked the asm generated, pattern use vector compare and vector select, which I believe is the best pattern at this time for floating point numbers