Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saturation in compiler generated Stickify #2877

Merged
merged 11 commits into from
Jul 16, 2024

Conversation

AlexandreEichenberger
Copy link
Collaborator

This algorithm adds the equivalent of this (per William Jones) to the stickify compiler generated code.

static const __vector float min_vec = {-8573157376.F, -8573157376.F, -8573157376.F, -8573157376.F};
static const __vector float max_vec = {8573157376.F, 8573157376.F, 8573157376.F, 8573157376.F};

float* a = ...;
float* b = ...;
int16_t* out = ...;

__vector float* vec_a = (__vector float*)a;
__vector float* vec_b = (__vector float*)b;
__vector short* vec_out = (__vector short*)out;

*vec_out = vec_round_from_fp32(
    vec_max(vec_min(*vec_a, max_vec), min_vec),
    vec_max(vec_min(*vec_b, max_vec), min_vec),
    0);

When running on z16 with Roberta base 11 and 6x384 in sequential mode

zdnn
  zhigh.Stick, 85, 0.5742549, 48.8116667, 5.6%

compiler gen code without saturation
  zhigh.Stick, 85, 0.3593294, 30.5430000, 3.7%

compiler gen WITH saturation
  zhigh.Stick, 85, 0.4358196, 37.0446667, 4.4%

As shown, saturation runs still faster than zdnn based code.

Checked the asm generated, pattern use vector compare and vector select, which I believe is the best pattern at this time for floating point numbers

    81e6:	e7 91 10 00 24 ea 	vfchesb	%v9,%v17,%v1                    <<<<==== vadd
    81ec:	e7 20 20 00 21 8d 	vsel	%v2,%v0,%v2,%v18                   <<<<==== vshuffle

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
@AlexandreEichenberger
Copy link
Collaborator Author

Updated the code to gen the min/max directly.

update performance:

zdnn
  zhigh.Stick, 85, 0.5742549, 48.8116667, 5.6%

compiler gen code without saturation
  zhigh.Stick, 85, 0.3593294, 30.5430000, 3.7%

compiler gen WITH saturation
  zhigh.Stick, 85, 0.3676314, 31.2486667, 3.8%

For a performance reduction of 3% (compared to compiler generated without saturation), and still 55% faster than zDNN.

Verified that it generates the right code

    8050:	e7 11 00 40 2c ee 	vfminsb	%v17,%v17,%v0,4                 <<<<==== vadd
...
    8062:	e7 22 10 40 20 ef 	vfmaxsb	%v2,%v2,%v1,4                   <<<<==== vadd

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Copy link
Collaborator

@tungld tungld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for a quick update!

@tungld tungld merged commit 7879d17 into onnx:main Jul 16, 2024
7 checks passed
@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15143 [push] Saturation in compiler g... started at 22:33

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15138 [push] Saturation in compiler g... started at 21:33

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14168 [push] Saturation in compiler g... started at 22:44

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15138 [push] Saturation in compiler g... passed after 1 hr 13 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15143 [push] Saturation in compiler g... passed after 1 hr 39 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14168 [push] Saturation in compiler g... passed after 2 hr 9 min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants