Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support general SIMD instruction #8

Merged
merged 3 commits into from
Jan 17, 2022
Merged

Conversation

harihitode
Copy link

Making ndt_omp independent to the SIMD architecture

  • using compiler flag -march=native, instead of using sse flags immediately.
  • using aligned allocation std::vector of Eigen::Matrix

@harihitode harihitode self-assigned this Dec 9, 2021
@kenji-miyake kenji-miyake changed the base branch from master to tier4/main December 9, 2021 13:08
@kenji-miyake
Copy link

kenji-miyake commented Dec 9, 2021

I'll close and re-open since I changed the target branch to try to run CI workflows.
-> But it seems there is no effect. 😢

@KeisukeShima
Copy link

@kenji-miyake Please double-check this PR 🙏

# should use march=native ?
add_definitions(-std=c++14 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2)
set(CMAKE_CXX_FLAGS "-std=c++14 -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2")
add_compile_options(-march=native)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will removing -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 not affect the performance?
Could you link any references around these options?

Also, are there any reasons for removing -std=c++14?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will removing -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 not affect the performance?

Yes, @harihitode san evaluated the performance.
https://tier4.atlassian.net/browse/T4PB-12676?focusedCommentId=69870

Here is the reference.
http://eigen.tuxfamily.org/index.php?title=FAQ#How_can_I_enable_vectorization.3F

Also, are there any reasons for removing -std=c++14?

Ah, I guess we need to keep this option. It's better to use set(CMAKE_CXX_STANDARD 14) instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @harihitode san evaluated the performance.
https://tier4.atlassian.net/browse/T4PB-12676?focusedCommentId=69870

Since it's an internal link, we should copy the result here so that everyone can see it.

Ah, I guess we need to keep this option. It's better to use set(CMAKE_CXX_STANDARD 14) instead.

Yes, but making it another PR is better.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but making it another PR is better.

OK. Then I restore c++14 option.
3b3b92a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's an internal link, we should copy the result here so that everyone can see it.

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but you should write the condition of the experiment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenji-miyake @KeisukeShima The result is obtained on Intel machine using rosbag playing simulation.

CPU: i9-9900 supporting AVX256 (256 bit SIMD)
We play ROSBAG as instructed in https://tier4.github.io/autoware.proj/tree/main/tutorial/QuickStart/ .

Red line is our proposal which uses 256 bit SIMD in the execution binary. Current default is the blue line, 128 bit SIMD.
Because this program itself is not tuned to the 256 bit SIMD, we have no performance-up either performance degradation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yukkysaito Thanks for your comment. I think this change has no effect on its performance, but I have evaluated it only on rosbag-simulation.

@@ -228,7 +228,7 @@ namespace pclomp
}

/** \brief Return the transformation array */
inline const std::vector<Eigen::Matrix4f>
inline const std::vector<Eigen::Matrix4f, Eigen::aligned_allocator<Eigen::Matrix4f>>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you link Eigen's documents as well?
Since I think we'll submit a PR to upstream, we need some references to explain to the maintainer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it has something to do with this issue isl-org/Open3D#653

@kenji-miyake
Copy link

Also, why the CI has failed? I think we should fix them.

@KeisukeShima
Copy link

The CI target is noetic and melodic. CI scripts for ROS2 is ready for review on #5

@kenji-miyake
Copy link

Let's merge this after #5 is merged and the CI is checked in this PR.

@KeisukeShima @harihitode Could you send PRs equivalent with #5 and #8 to https://github.com/koide3/ndt_omp?

@kenji-miyake
Copy link

@KeisukeShima Could you rebase this, please? 🙏

@KeisukeShima
Copy link

This repository does not allow force push, so I created a new PR. #9

@kenji-miyake
Copy link

hmm, actually it was possible.

@kenji-miyake kenji-miyake reopened this Jan 13, 2022
harihitode and others added 3 commits January 13, 2022 17:10
Signed-off-by: Keisuke Shima <19993104+KeisukeShima@users.noreply.github.com>
Copy link

@kenji-miyake kenji-miyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yukkysaito @YamatoAndo @e-takeuchi I'd like to merge this PR for ARM support.
According to the evaluation result by @harihitode, there is no performance degradation.

Is that okay for you?

@yukkysaito
Copy link
Collaborator

@kenji-miyake It is good, although there is a question of whether the same results can be obtained on real machines on vehicles.

@yukkysaito yukkysaito self-requested a review January 17, 2022 07:00
@KeisukeShima KeisukeShima merged commit b9b7170 into tier4/main Jan 17, 2022
@KeisukeShima KeisukeShima deleted the feature/fix_simd branch January 17, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants