-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64 base algorithm refactoring in LLVM #6907
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps slightly more explanation on the add-pair part of the intrinsic, otherwise looks like a significant improvement.
cc @FrozenGene @yzhliu if you're interested |
This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue.
This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue.
This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue.
This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue.
* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from #6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases
- I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target
3b5f93b
to
d54e73a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some final comments on the docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This is now merged, thanks @giuseros ! |
* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases
* AArch64 base algorithm refactoring in LLVM - I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target * Fix linting * Fix linting -2 * Fixing comments * Address review comments * Fix spaces around ':' in docstrings
* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases
* AArch64 base algorithm refactoring in LLVM - I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target * Fix linting * Fix linting -2 * Fixing comments * Address review comments * Fix spaces around ':' in docstrings
* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases
* AArch64 base algorithm refactoring in LLVM - I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target * Fix linting * Fix linting -2 * Fixing comments * Address review comments * Fix spaces around ':' in docstrings
interleave
boolean parameter in the intrinsic to switchamong two different interleaving modes. LLVM will now take care of
interleaving the instructions
instrinsic
Note: I found LLVM very sensible to the choice of the
-mcpu
.So, in order to preserve performance, it is important to specify the
right
-mcpu
when creating the LLVM target