-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support general SIMD instruction #8
Conversation
I'll close and re-open since I changed the target branch to try to run CI workflows. |
@kenji-miyake Please double-check this PR 🙏 |
# should use march=native ? | ||
add_definitions(-std=c++14 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2) | ||
set(CMAKE_CXX_FLAGS "-std=c++14 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2") | ||
add_compile_options(-march=native) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will removing -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2
not affect the performance?
Could you link any references around these options?
Also, are there any reasons for removing -std=c++14
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will removing -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 not affect the performance?
Yes, @harihitode san evaluated the performance.
https://tier4.atlassian.net/browse/T4PB-12676?focusedCommentId=69870
Here is the reference.
http://eigen.tuxfamily.org/index.php?title=FAQ#How_can_I_enable_vectorization.3F
Also, are there any reasons for removing -std=c++14?
Ah, I guess we need to keep this option. It's better to use set(CMAKE_CXX_STANDARD 14)
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @harihitode san evaluated the performance.
https://tier4.atlassian.net/browse/T4PB-12676?focusedCommentId=69870
Since it's an internal link, we should copy the result here so that everyone can see it.
Ah, I guess we need to keep this option. It's better to use set(CMAKE_CXX_STANDARD 14) instead.
Yes, but making it another PR is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but making it another PR is better.
OK. Then I restore c++14 option.
3b3b92a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, but you should write the condition of the experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenji-miyake @KeisukeShima The result is obtained on Intel machine using rosbag playing simulation.
CPU: i9-9900 supporting AVX256 (256 bit SIMD)
We play ROSBAG as instructed in https://tier4.github.io/autoware.proj/tree/main/tutorial/QuickStart/ .
Red line is our proposal which uses 256 bit SIMD in the execution binary. Current default is the blue line, 128 bit SIMD.
Because this program itself is not tuned to the 256 bit SIMD, we have no performance-up either performance degradation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yukkysaito Thanks for your comment. I think this change has no effect on its performance, but I have evaluated it only on rosbag-simulation.
@@ -228,7 +228,7 @@ namespace pclomp | |||
} | |||
|
|||
/** \brief Return the transformation array */ | |||
inline const std::vector<Eigen::Matrix4f> | |||
inline const std::vector<Eigen::Matrix4f, Eigen::aligned_allocator<Eigen::Matrix4f>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you link Eigen's documents as well?
Since I think we'll submit a PR to upstream, we need some references to explain to the maintainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it has something to do with this issue isl-org/Open3D#653
Also, why the CI has failed? I think we should fix them. |
The CI target is noetic and melodic. CI scripts for ROS2 is ready for review on #5 |
Let's merge this after #5 is merged and the CI is checked in this PR. @KeisukeShima @harihitode Could you send PRs equivalent with #5 and #8 to https://github.com/koide3/ndt_omp? |
@KeisukeShima Could you rebase this, please? 🙏 |
This repository does not allow force push, so I created a new PR. #9 |
hmm, actually it was possible. |
Signed-off-by: Keisuke Shima <19993104+KeisukeShima@users.noreply.github.com>
3b3b92a
to
07b7f5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yukkysaito @YamatoAndo @e-takeuchi I'd like to merge this PR for ARM support.
According to the evaluation result by @harihitode, there is no performance degradation.
Is that okay for you?
@kenji-miyake It is good, although there is a question of whether the same results can be obtained on real machines on vehicles. |
Making ndt_omp independent to the SIMD architecture