Support general SIMD instruction #8

harihitode · 2021-12-09T13:05:24Z

Making ndt_omp independent to the SIMD architecture

using compiler flag -march=native, instead of using sse flags immediately.
using aligned allocation std::vector of Eigen::Matrix

kenji-miyake · 2021-12-09T13:09:22Z

I'll close and re-open since I changed the target branch to try to run CI workflows.
-> But it seems there is no effect. 😢

KeisukeShima · 2022-01-13T06:29:04Z

@kenji-miyake Please double-check this PR 🙏

kenji-miyake · 2022-01-13T06:32:12Z

CMakeLists.txt

-# should use march=native ?
-add_definitions(-std=c++14 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2)
-set(CMAKE_CXX_FLAGS "-std=c++14 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2")
+add_compile_options(-march=native)


Will removing -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 not affect the performance?
Could you link any references around these options?

Also, are there any reasons for removing -std=c++14?

Will removing -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 not affect the performance?

Yes, @harihitode san evaluated the performance.
https://tier4.atlassian.net/browse/T4PB-12676?focusedCommentId=69870

Here is the reference.
http://eigen.tuxfamily.org/index.php?title=FAQ#How_can_I_enable_vectorization.3F

Also, are there any reasons for removing -std=c++14?

Ah, I guess we need to keep this option. It's better to use set(CMAKE_CXX_STANDARD 14) instead.

Yes, @harihitode san evaluated the performance.
https://tier4.atlassian.net/browse/T4PB-12676?focusedCommentId=69870

Since it's an internal link, we should copy the result here so that everyone can see it.

Ah, I guess we need to keep this option. It's better to use set(CMAKE_CXX_STANDARD 14) instead.

Yes, but making it another PR is better.

Yes, but making it another PR is better.

OK. Then I restore c++14 option.
3b3b92a

Since it's an internal link, we should copy the result here so that everyone can see it.

Thank you, but you should write the condition of the experiment.

@kenji-miyake @KeisukeShima The result is obtained on Intel machine using rosbag playing simulation.

CPU: i9-9900 supporting AVX256 (256 bit SIMD)
We play ROSBAG as instructed in https://tier4.github.io/autoware.proj/tree/main/tutorial/QuickStart/ .

Red line is our proposal which uses 256 bit SIMD in the execution binary. Current default is the blue line, 128 bit SIMD.
Because this program itself is not tuned to the 256 bit SIMD, we have no performance-up either performance degradation.

@yukkysaito Thanks for your comment. I think this change has no effect on its performance, but I have evaluated it only on rosbag-simulation.

kenji-miyake · 2022-01-13T06:34:13Z

include/pclomp/ndt_omp.h

@@ -228,7 +228,7 @@ namespace pclomp
 		}

 		/** \brief Return the transformation array */
-		inline const std::vector<Eigen::Matrix4f>
+		inline const std::vector<Eigen::Matrix4f, Eigen::aligned_allocator<Eigen::Matrix4f>>


Could you link Eigen's documents as well?
Since I think we'll submit a PR to upstream, we need some references to explain to the maintainer.

I guess it has something to do with this issue isl-org/Open3D#653

kenji-miyake · 2022-01-13T06:37:33Z

Also, why the CI has failed? I think we should fix them.

KeisukeShima · 2022-01-13T06:57:51Z

The CI target is noetic and melodic. CI scripts for ROS2 is ready for review on #5

kenji-miyake · 2022-01-13T07:09:13Z

Let's merge this after #5 is merged and the CI is checked in this PR.

@KeisukeShima @harihitode Could you send PRs equivalent with #5 and #8 to https://github.com/koide3/ndt_omp?

kenji-miyake · 2022-01-13T08:01:32Z

@KeisukeShima Could you rebase this, please? 🙏

KeisukeShima · 2022-01-13T08:08:22Z

This repository does not allow force push, so I created a new PR. #9

kenji-miyake · 2022-01-13T08:09:51Z

hmm, actually it was possible.

…IMD instructions

Signed-off-by: Keisuke Shima <19993104+KeisukeShima@users.noreply.github.com>

kenji-miyake

@yukkysaito @YamatoAndo @e-takeuchi I'd like to merge this PR for ARM support.
According to the evaluation result by @harihitode, there is no performance degradation.

Is that okay for you?

CMakeLists.txt

yukkysaito · 2022-01-13T12:21:12Z

@kenji-miyake It is good, although there is a question of whether the same results can be obtained on real machines on vehicles.

harihitode requested review from KeisukeShima and kenji-miyake December 9, 2021 13:05

harihitode self-assigned this Dec 9, 2021

kenji-miyake changed the base branch from master to tier4/main December 9, 2021 13:08

kenji-miyake closed this Dec 9, 2021

kenji-miyake reopened this Dec 9, 2021

harihitode mentioned this pull request Dec 9, 2021

fix(ndt_scan_matcher): fix compiler flags and Eigen aligned allocator to support general SIMD architecture autowarefoundation/autoware.universe#154

Merged

7 tasks

KeisukeShima approved these changes Jan 13, 2022

View reviewed changes

kenji-miyake reviewed Jan 13, 2022

View reviewed changes

KeisukeShima mentioned this pull request Jan 13, 2022

Support general SIMD instruction #9

Closed

KeisukeShima closed this Jan 13, 2022

kenji-miyake reopened this Jan 13, 2022

harihitode and others added 3 commits January 13, 2022 17:10

not to use architecture specific SIMD flags

e719f84

use aligned_allocator for eigen data vector to be used correctly by S…

259f92e

…IMD instructions

restore c++14 option

07b7f5a

Signed-off-by: Keisuke Shima <19993104+KeisukeShima@users.noreply.github.com>

kenji-miyake force-pushed the feature/fix_simd branch from 3b3b92a to 07b7f5a Compare January 13, 2022 08:10

kenji-miyake approved these changes Jan 13, 2022

View reviewed changes

kenji-miyake reviewed Jan 13, 2022

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

yukkysaito self-requested a review January 17, 2022 07:00

yukkysaito approved these changes Jan 17, 2022

View reviewed changes

KeisukeShima mentioned this pull request Jan 17, 2022

Change SIMD flags to support aarch64 koide3/ndt_omp#44

Merged

KeisukeShima merged commit b9b7170 into tier4/main Jan 17, 2022

KeisukeShima deleted the feature/fix_simd branch January 17, 2022 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support general SIMD instruction #8

Support general SIMD instruction #8

harihitode commented Dec 9, 2021

kenji-miyake commented Dec 9, 2021 •

edited

Loading

KeisukeShima commented Jan 13, 2022

kenji-miyake Jan 13, 2022

KeisukeShima Jan 13, 2022

kenji-miyake Jan 13, 2022

KeisukeShima Jan 13, 2022

KeisukeShima Jan 13, 2022

kenji-miyake Jan 13, 2022

harihitode Jan 13, 2022

harihitode Jan 13, 2022

kenji-miyake Jan 13, 2022

KeisukeShima Jan 13, 2022

kenji-miyake commented Jan 13, 2022

KeisukeShima commented Jan 13, 2022

kenji-miyake commented Jan 13, 2022

kenji-miyake commented Jan 13, 2022

KeisukeShima commented Jan 13, 2022

kenji-miyake commented Jan 13, 2022

kenji-miyake left a comment

yukkysaito commented Jan 13, 2022

Support general SIMD instruction #8

Support general SIMD instruction #8

Conversation

harihitode commented Dec 9, 2021

kenji-miyake commented Dec 9, 2021 • edited Loading

KeisukeShima commented Jan 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kenji-miyake commented Jan 13, 2022

KeisukeShima commented Jan 13, 2022

kenji-miyake commented Jan 13, 2022

kenji-miyake commented Jan 13, 2022

KeisukeShima commented Jan 13, 2022

kenji-miyake commented Jan 13, 2022

kenji-miyake left a comment

Choose a reason for hiding this comment

yukkysaito commented Jan 13, 2022

kenji-miyake commented Dec 9, 2021 •

edited

Loading