-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(ring_outlier_filter): performance tuning #3014
perf(ring_outlier_filter): performance tuning #3014
Conversation
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp
Show resolved
Hide resolved
sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp
Show resolved
Hide resolved
for (int i = walk_first_idx; i <= walk_last_idx; i++) { | ||
auto output_ptr = reinterpret_cast<PointXYZI *>(&output.data[output_size]); | ||
*output_ptr = *reinterpret_cast<const PointXYZI *>(&input->data[indices[i]]); | ||
output_size += output.point_step; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part assume data layout is PointXYZI
or at least that input memory layout starts the same than PointXYZI
. In practice it will just copy the first 4 fields. Is there a reason to ignore the other fields from input (ring, azimut, etc?) I know today Autoware use the ring filter as the last filter that needs "extended" fields, but who knows if it will still be true in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, it is fine with no changes (cf.
autoware.universe/common/autoware_point_types/include/autoware_point_types/types.hpp
Lines 59 to 63 in c7d84c6
float x{0.0F}; | |
float y{0.0F}; | |
float z{0.0F}; | |
char padding1[4]{0U}; | |
float intensity{0.0F}; |
But we have a bug and are dealing with it in #2618.
In the long-term perspective, we will make chagne according to this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thank you for the link. I find the manual 4-bytes padding quite strange... but that's another topic.
Still, I think this way of using reinterpret_cast
is UB. For example, since the the compiler is not told not to pad the 2 structs with annotations such as __attribute__((packed))
, the compiler is "free" to pad the structs as it sees fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a universal problem in Autoware and not related to this Pull Request, so please raise another Issue.
std::max(current_pt_distance, next_pt_distance) < | ||
std::min(current_pt_distance, next_pt_distance) * distance_ratio_ && | ||
std::max(current_distance, next_distance) < | ||
std::min(current_distance, next_distance) * distance_ratio_ && | ||
azimuth_diff < 100.f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should azimuth_diff < 100.f
be a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number has been hard-coded for a long period of time and should, of course, be parameterised, but this is not necessary.
for (size_t idx = 0U; idx < indices.size() - 1; ++idx) { | ||
const size_t & current_data_idx = indices[idx]; | ||
const size_t & next_data_idx = indices[idx + 1]; | ||
walk_last_idx = idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are skipping the last point with this logic.
For example, let's imagine a perfect circle ring with N points.
When we first enter the first loop we have :
idx = 0
walk_first_idx = 0
walk_last_idx = idx = 0
current_data_idx = indices[0]
next_data_idx= indices[1]
The ring is a perfect circle, so all the points are part of the same walk. So we always continue here:
if (
std::max(current_distance, next_distance) <
std::min(current_distance, next_distance) * distance_ratio_ &&
azimuth_diff < 100.f) {
continue; // Determined to be included in the same walk
}
On the last loop we have:
idx = N-2
walk_first_idx = 0
walk_last_idx = idx = N-2
current_data_idx = indices[N-2]
next_data_idx= indices[N-1]
So after the loop, we have these lines:
if (isCluster(
input, indices[walk_first_idx], indices[walk_last_idx],
walk_last_idx - walk_first_idx + 1)) {
for (int i = walk_first_idx; i <= walk_last_idx; i++) {
auto output_ptr = reinterpret_cast<PointXYZI *>(&output.data[output_size]);
*output_ptr = *reinterpret_cast<const PointXYZI *>(&input->data[indices[i]]);
output_size += output.point_step;
}
}
}
which basically copy points from walk_first_idx = 0
to walk_last_idx = N-2
, thus skipping the last point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as below
} | ||
} | ||
tmp_indices.clear(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the line above is unrelated to my comment.
Another approximation I have found with this implementation is that the ring end is not "connected" to its start.
For example, let's imagine this ring:
ring : xxxx..xxxxxxxxxxx..xxxx.xx
index: abcdefghijklmnopqrstuvwxyz
Where chains of x
represent points from the same "walk", and .
the noisy points ment to be filtered. The index alphabet is just for the explanation. So in the situation here, there are 3 rings:
- from 'g' to 'q'
- from 't' to 'w'
- from 'y' to 'd' (wrapped around)
However in your implementation, we start at index 0 ('a') and the start/end of the ring are not considered connected. So you would find the rings:
- from 'a' to 'd'
- from 'g' to 'q'
- from 't' to 'w'
The algorithm would consider the walk 'y' -> 'z', but as it is too short it would ignore the points.
With the default parameters num_points_threshold_ = 4
, up to 3 points could be missing because of that. When using bigger num_points_threshold_
value, the filter would sometimes produce some big blinspot in the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this problem has been around for a while and has nothing to do with this Pull Request, can you please create a new Issue?
sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm annoying with all my comments ;-)
sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp
Show resolved
Hide resolved
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #3014 +/- ##
==========================================
- Coverage 12.75% 12.61% -0.15%
==========================================
Files 1218 1223 +5
Lines 85981 86953 +972
Branches 24469 24469
==========================================
Hits 10965 10965
- Misses 63655 64627 +972
Partials 11361 11361
*This pull request uses carry forward flags. Click here to find out more.
... and 12 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
..._preprocessor/include/pointcloud_preprocessor/outlier_filter/ring_outlier_filter_nodelet.hpp
Outdated
Show resolved
Hide resolved
auto y = p1[1] - p2[1]; | ||
auto z = p1[2] - p2[2]; | ||
|
||
return x * x + y * y + z * z >= object_length_threshold_ * object_length_threshold_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further optimization idea:
Another way to compute the distance between the 2 points is to use the law of cosines: dist2(p1,p2) = d1^2 + d2^2 - 2*d1*d2*cos(a)
. Where d1
and d2
are the distance
attribute of p1
and p2
, and a
the azimut
diff between the 2 points.
What is interesting with this approach is that a
is always small. Indeed, since isCluster
has early returns when walk has more than num_points_threshold_
(3 by default), it means the current walk has 3 points or less. But to be added to the walk, the azimut diff between 2 points must be less than 1 degree. So the a
value is at most 3 degrees. Thus cos(a) ~= 1
.
The distance calculation can be simplified to: dist2(p1,p2) = (d1 - d2)^2
.
-> we don't need to fetch the xyz points anymore, just the 2 distance
attributes are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal changes the logical output by approximation. It is beyond the scope of this Pull Request to speed up the processing without changing the logical output. It would be appreciated if you could start a new issue for discussion and agreement.
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
This reverts commit 1463b7d.
@Shin-kyoto |
@sykwer Thanks! I have also confirmed that the number of points and the x,y,z value of each point is unchanged before and after this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also confirmed that the number of points and the x,y,z value of each point is unchanged before and after this PR.
I checked whether the number of point clouds per topic and the values (x, y, z) of each point in the point clouds remained unchanged before and after the pull request. I used sample rosbag used in tutorial of rosbag replay simulation
The results are as follows:
- Number of point clouds per topic: No difference before and after the pull request.
- Values (x, y, z) of each point in the point clouds within the topic: No difference before and after the pull request for each of x, y, and z.
LGTM
…tion#3014)" This reverts commit 81e03ae.
I just found a small bug in output pointcloud filed.
|
The ring indices were not read correctly from point buffers. The ring/azimuth/etc. offsets were set to indices of the respective fields in the input buffer's fields vector. The missing lookup of the offset in the respective index's field definition was added. As discussed in autowarefoundation#3014 the law of cosines approach for calculating point distances in isCluster ields logically different results and thus needs more disussion before being merged. The isCluster implementation in this commit has been reverted to the previous euclidean distance check. LiDAR point representation in the ad-hoc WalkInfo struct has been replaced by a more readable ad-hoc PointXYZAD struct representation. Once the input data has an enforced, consistent format, existing point types can be used instead. Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>
The ring indices were not read correctly from point buffers. The ring/azimuth/etc. offsets were set to indices of the respective fields in the input buffer's fields vector. The missing lookup of the offset in the respective index's field definition was added. As discussed in autowarefoundation#3014 the law of cosines approach for calculating point distances in isCluster ields logically different results and thus needs more disussion before being merged. The isCluster implementation in this commit has been reverted to the previous euclidean distance check. LiDAR point representation in the ad-hoc WalkInfo struct has been replaced by a more readable ad-hoc PointXYZAD struct representation. Once the input data has an enforced, consistent format, existing point types can be used instead. Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>
The ring indices were not read correctly from point buffers. The ring/azimuth/etc. offsets were set to indices of the respective fields in the input buffer's fields vector. The missing lookup of the offset in the respective index's field definition was added. As discussed in autowarefoundation#3014 the law of cosines approach for calculating point distances in isCluster ields logically different results and thus needs more disussion before being merged. The isCluster implementation in this commit has been reverted to the previous euclidean distance check. LiDAR point representation in the ad-hoc WalkInfo struct has been replaced by a more readable ad-hoc PointXYZAD struct representation. Once the input data has an enforced, consistent format, existing point types can be used instead. Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
…tion#3014)" This reverts commit 81e03ae.
Description
This PR makes
ring_outlier_filter
faster without changing the logical output. The tail latency of thering_outlier_filter
node gets about x3 faster with the introduction of the TLSF allocator (see this Autoware Discussion) and this PR merged.Measurement Condition
Related links
Tests performed
Check if
ring_outlier_filter
publishes the same logical output as before (using Autoware Universe rosbag simulation)Notes for reviewers
Pre-review checklist for the PR author
The PR author must check the checkboxes below when creating the PR.
In-review checklist for the PR reviewers
The PR reviewers must check the checkboxes below before approval.
Post-review checklist for the PR author
The PR author must check the checkboxes below before merging.
After all checkboxes are checked, anyone who has write access can merge the PR.