-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CODEGEN] ARM Popcount lowering rule and codegen updates #1235
Conversation
…ing and accessing vectors
There is an unintended change that reverts the submodule to an older version. Please update the submodule (HalideIR) to the latest version. You can do it by git pull under the HalideIR folder |
src/codegen/llvm/codegen_llvm.cc
Outdated
@@ -366,7 +366,7 @@ llvm::Value* CodeGenLLVM::CreateBroadcast(llvm::Value* value, int lanes) { | |||
llvm::Value* CodeGenLLVM::CreateVecSlice(llvm::Value* vec, int begin, int extent) { | |||
int num_elems = static_cast<int>(vec->getType()->getVectorNumElements()); | |||
if (extent == num_elems && begin == 0) return vec; | |||
CHECK_LT(begin + extent, num_elems); | |||
CHECK_LT(begin + extent, num_elems+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK_LT-> CHECK_LE
return CodeGenCPU::CreateIntrinsic(op); | ||
} | ||
|
||
Expr CodeGenARM::ARMPopcount(const Call *call) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need a regression test for this rule. please add a test case to arm popcount, to a new file tests/python/unittest/test_codegen_arm.py .
Since we don't have ARM device to verify, what we can do is to dump out the asm file(Maybe we can patch GetSource in llvm module to support get_source("asm") ) and verify the neons sequence is as expected.
src/codegen/llvm/codegen_arm.cc
Outdated
::llvm::Intrinsic::ID vpaddu_id = ::llvm::Intrinsic::arm_neon_vpaddlu; | ||
|
||
|
||
Type uint8_type = Type(e.type().code(), 8, e.type().bits() * e.type().lanes() / 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the typedef after the fallback guard, add comment that the division is always dividable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about what this specific pattern of neon sequence is
Thanks, this is merged! |
Nice! |
TVM compiler changes for low precision operators
Thanks for contributing to TVM! Please refer to guideline http://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from others in the community.