Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARM32 kernel implementation #432

Merged
merged 24 commits into from
Aug 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
7494af9
Add arm32 4x4 kernel impl
Jul 22, 2020
d21344a
Add arm32 4x4 kernel impl
honglh Jul 22, 2020
d94b1e6
Add arm32 4x4 kernel impl
honglh Jul 22, 2020
3263994
Update to clang-format-5.0 format
Jul 30, 2020
f132e64
Remove redundant #if 1
Jul 30, 2020
62fe3f4
Undef constatns after use
Jul 30, 2020
695a4c7
Corrected incorrect ifdef comment
honglh Jul 30, 2020
cba31a5
Add bgemm_kernes_arm32.h as depdendent
honglh Jul 30, 2020
89aac02
Use vpaddl.u8 and u16 better because popcnt will not be negative
Jul 30, 2020
229e94a
use proper kernel for compiled arch
Aug 2, 2020
9716065
use proper kernel for compiled arch
Aug 2, 2020
7cee40f
use proper kernel for compiled arch
Aug 2, 2020
762b55b
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
73600a1
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
6fa0901
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
0a375d4
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
f7b65cc
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
0b141a7
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
9c32d6e
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
4332f55
Update larq_compute_engine/core/bgemm_kernels_arm.h
honglh Aug 3, 2020
32c954c
Update larq_compute_engine/core/bgemm_kernels_arm32.h
honglh Aug 3, 2020
bc4b7f5
Use kNeon for 32-bit input only
honglh Aug 4, 2020
d64ae7c
Use kNeon for 32-bit input only
honglh Aug 4, 2020
4daf896
fix clang-format incompliance
honglh Aug 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions larq_compute_engine/core/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ cc_library(
name = "bgemm_kernels_arm",
hdrs = [
"bgemm_kernels_arm.h",
"bgemm_kernels_arm32.h",
"bgemm_kernels_arm64.h",
],
deps = [
Expand Down
5 changes: 3 additions & 2 deletions larq_compute_engine/core/bgemm_impl_ruy.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,9 @@ struct BGemmImplUsingRuy {
if (bgemm_runtime_path == ruy::Path::kNeonDotprod)
bgemm_runtime_path = ruy::Path::kNeon;
#if RUY_PLATFORM_NEON_32
// 32-bit NEON optimized code is not available yet
bgemm_runtime_path = ruy::Path::kStandardCpp;
if (!std::is_same<LhsScalar, std::uint32_t>::value) {
bgemm_runtime_path = ruy::Path::kStandardCpp;
}
#endif
// Currently we only have 32-bit and 64-bit optimized kernels.
// For 8-bit, fall back to the standard cpp kernel.
Expand Down
25 changes: 25 additions & 0 deletions larq_compute_engine/core/bgemm_kernels_arm.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,31 @@ struct BgemmKernel<ruy::Path::kNeonDotprod, LhsScalar, RhsScalar, DstScalar,
}
};

#if RUY_PLATFORM_NEON && RUY_OPT(ASM) && RUY_PLATFORM_NEON_32
honglh marked this conversation as resolved.
Show resolved Hide resolved
// A BGEMM kernel for ARM32 Neon.
#include "bgemm_kernels_arm32.h"
AdamHillier marked this conversation as resolved.
Show resolved Hide resolved
template <>
struct BgemmKernel<ruy::Path::kNeon, std::uint32_t, std::uint32_t, float,
BinaryMulParams<std::int32_t, float>> {
Tuning tuning = Tuning::kAuto;
using LhsLayout = FixedKernelLayout<Order::kColMajor, 4, 4>;
using RhsLayout = FixedKernelLayout<Order::kColMajor, 4, 4>;
explicit BgemmKernel(Tuning tuning_) : tuning(tuning_) {}
void Run(const ruy::PMat<std::uint32_t>& lhs,
const ruy::PMat<std::uint32_t>& rhs,
const BinaryMulParams<std::int32_t /* accum. scalar */, float>&
mul_params,
int start_row, int start_col, int end_row, int end_col,
ruy::Mat<float>* dst) const {
BinaryKernelParams<LhsLayout::kCols, RhsLayout::kCols, std::uint32_t>
params;
MakeBinaryKernelParams(lhs, rhs, mul_params, start_row, start_col, end_row,
end_col, dst, &params);
BinaryKernelNeonOutOfOrder32BP4x4(params);
}
};
#endif

#if RUY_PLATFORM_NEON && RUY_OPT(ASM) && RUY_PLATFORM_NEON_64
// A BGEMM kernel for ARM64 Neon.
#include "bgemm_kernels_arm64.h"
Expand Down
Loading