AArch64 base algorithm refactoring in LLVM #6907

giuseros · 2020-11-12T19:22:50Z

I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR
Removed the interleave boolean parameter in the intrinsic to switch
among two different interleaving modes. LLVM will now take care of
interleaving the instructions
Applied the changes accordingly to conv2d_gemm.py to call the right
instrinsic

Note: I found LLVM very sensible to the choice of the -mcpu.
So, in order to preserve performance, it is important to specify the
right -mcpu when creating the LLVM target

giuseros · 2020-11-12T19:23:51Z

cc @mbaret @u99127 @anijain2305

mbaret

Perhaps slightly more explanation on the add-pair part of the intrinsic, otherwise looks like a significant improvement.

python/tvm/tir/ir_builder.py

python/tvm/topi/arm_cpu/tensor_intrin.py

mbaret · 2020-11-17T16:42:28Z

cc @FrozenGene @yzhliu if you're interested

This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue.

* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from #6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases

- I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target

mbaret

Just some final comments on the docstrings.

python/tvm/topi/arm_cpu/tensor_intrin.py

mbaret

lgtm

mbaret · 2020-11-23T20:15:58Z

This is now merged, thanks @giuseros !

* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases

* AArch64 base algorithm refactoring in LLVM - I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target * Fix linting * Fix linting -2 * Fixing comments * Address review comments * Fix spaces around ':' in docstrings

* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases

* AArch64 base algorithm refactoring in LLVM - I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target * Fix linting * Fix linting -2 * Fixing comments * Address review comments * Fix spaces around ':' in docstrings

* Bug-fix] Fix tir allocation with multiple lanes This PR stemmed from apache#6907 and it is fixing a small error in the getter and setter of a buffer for the case where `t.lanes > 1`. I also added a test to stress the issue. * Address dtyped vs non-dtyped constant cases

* AArch64 base algorithm refactoring in LLVM - I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR - Removed the `interleave` boolean parameter in the intrinsic to switch among two different interleaving modes. LLVM will now take care of interleaving the instructions - Applied the changes accordingly to conv2d_gemm.py to call the right instrinsic Note: I found LLVM very sensible to the choice of the `-mcpu`. So, in order to preserve performance, it is important to specify the right `-mcpu` when creating the LLVM target * Fix linting * Fix linting -2 * Fixing comments * Address review comments * Fix spaces around ':' in docstrings

mbaret self-assigned this Nov 16, 2020

mbaret added the status: need review label Nov 16, 2020

mbaret reviewed Nov 17, 2020

View reviewed changes

giuseros mentioned this pull request Nov 19, 2020

Bug-fix] Fix tir allocation with multiple lanes #6941

Merged

Giuseppe Rossini added 4 commits November 20, 2020 10:55

Fix linting

b9a971a

Fix linting -2

40664b0

Fixing comments

d54e73a

giuseros force-pushed the aarch64_llvm_refactoring branch from 3b5f93b to d54e73a Compare November 20, 2020 11:05

mbaret reviewed Nov 23, 2020

View reviewed changes

python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved

python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved

python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved

Giuseppe Rossini added 2 commits November 23, 2020 14:48

Address review comments

45c1902

Fix spaces around ':' in docstrings

f55d031

mbaret added status: review in progress and removed status: need review labels Nov 23, 2020

mbaret approved these changes Nov 23, 2020

View reviewed changes

mbaret merged commit 5423ffe into apache:main Nov 23, 2020

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64 base algorithm refactoring in LLVM #6907

AArch64 base algorithm refactoring in LLVM #6907

giuseros commented Nov 12, 2020

giuseros commented Nov 12, 2020

mbaret left a comment

mbaret commented Nov 17, 2020

mbaret left a comment

mbaret left a comment

mbaret commented Nov 23, 2020

AArch64 base algorithm refactoring in LLVM #6907

AArch64 base algorithm refactoring in LLVM #6907

Conversation

giuseros commented Nov 12, 2020

giuseros commented Nov 12, 2020

mbaret left a comment

Choose a reason for hiding this comment

mbaret commented Nov 17, 2020

mbaret left a comment

Choose a reason for hiding this comment

mbaret left a comment

Choose a reason for hiding this comment

mbaret commented Nov 23, 2020